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ABSTRACT 



The Computerized Academic Counseling System (CACS) was developed as a 
counseling aid. The system was designed to provide in depth analysis and forecasting of 
student performance as an aid to counselors in assisting students in the selection of 
academic majors in which they are most likely to succeed. 

CACS was specially geared to meet the requirements of an advanced program of 
personnel management. It brings valid and comprehensive data analysis capabflity into the 
academic counselor's hands in a timely and efficient manner. It assists him in performing 
the guidance function and in conducting the research needed to advance knowledge in 
vital areas of human resource development. It provides: 

1 . Timely access to a comprehensive array of counseling information 

2. User*Oriented interface procedures 

3. Readfly interpretable displays 

4. Rexibflity of operation and maintenance 

5. Modular expansion capabflity 
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AUTOMATIC DATA PROCESSING SYSTEM AND PROCEDURES 
COMPUTERIZED ACADEMIC COUNSELING SYSTEM 



I. INTRODUCTION 

General 

This report describes tlie design, development 
and evaluation of the Computerized Academic 
Counseling System (CACS). Specifically designed 
as a counseling tod, for gollege level, CACS con- 
sists, in part, of a multiple linear regression 
equation whose outputs are: 

1. The predicted likelihood of success for a 
college student in any or all of the major fields of 
study; 

2. a predicted point estimate of a student's 
Grade Point Average (GPA) in each major; and 

3. a standard error >^ of estimate rbr each 
predicted point estimate. 

In addition, CACS provides, on request, a list of 
predictor values that are used in performing the 
GPA and probability of success calculations, and a 
summary of the student's academic grades. 

CACS is a modularized system written in 
COBOL for use on a Burroughs B-3S00 computer. 
The purpose of modularizing the program and 
writing it in COBOL was to (a) enable the system 
to be easily modified as curricula and majors 
change, and (&) permit adaptation of the system to 
operate on a variety of different computers with a 
minimum of reprogramming. In effect, CACS is a 
versatile model that is easy to use, and capable of 
modification and expansion; it will provide infor- 
mation that can be used by a counselor to success 
fully guide a student. 

Problem Statement 

it is generally agreed that not all high school 
graduates have the intellectual capability to 
succeed in coWege. Moreover, regardless of the 
number which mi^it be viewed as potentially 
successful college students, available instructor 
staff and facilities pose severe constraints on the 
number of applicants that can be accepted by 
colleges. As a consequence, colleges generally 
accept only those high school graduates who have 
established a high grade point average. Thus, it is 
(at least implicitly) assumed that prior academic 
history is an essential and perhaps sufficient con- 
dition for predicting success in college. 



High school students, junior college graduates; 
etc., who are precluded from attending college 
because of low academic grades, often are per- 
mitted a second opportunity, contingent on the 
results of a college administered battery of tests. 
Such tests usually consist of paced, simple arith- 
metic problems, vocabulary evaluations, verbal 
comprehansion and reasoning problems, and deter* 
minations of aptitude. Test batteries are validated 
and a cutoff line is drawn dichotomizing overall 
criteria scores into those which are acceptable and 
those which are unacceptable; it is assumed with a 
reasonable degree of confidence that students 
whose scores are below the cutoff line will not 
succeed in college. 

Despite considerable efforts to develop method- 
ologies which tend to ensure that accepted appli- 
cants will succed in college, the combined fiunk- 
out and drop-out rate is relatively high, reaching as 
much as 50 percent in some universities. 

At least three major factors can be delineated as 
contributing to student failure. First, many 
students enter these institutions with little or no 
consideration given to establishing academic (and 
future professional) goals or interests. As a result, 
such students perform poorly, change their major 
(and curriculum) frequently, and eventually flunk 
out or drop out. Second, many students simply 
demonstrate little motivation to study and their 
low grades provide cause for early termination. 
Finally, it is well recognized that the above noted 
selection process is fallible, permitting some appli- 
cants to filter throu^ who are not, in fact, suited 
for the university enviroiunent because of a host 
of reasons, not the least of which is a specific 
intellectual capability to master the academic 
requirements. 

Objective 

Two implications can be drawn from the prob- 
lem statement described above. First, prior 
academic history and test batteries provide reason- 
ably good predictors of achievement in college, 
albeit with less accuracy than desired but far 
greater than that achievable by other techniques. 
Second, prediction success may increase dramat- 
ically if the "evaluation instrument** specifically 
takes into account student aptitudes and other 
factors related to curricula, thus maximizing the 
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likelihood that students will select the correct 
curriculum. 

In essence, the System Development Corpora- 
tion (SDC) sought to develop a predictor instru- 
ment that co)ild be used by counselors at any time 
to predict a student's likelihood of success in any 
of the major fields of study offered. Such an 
instrument would serve as an aid to counselors 
during the process of advising and counseling 
college students with respect to selecting the most 
appropriate major fields of study. 

Design Rationale 

General Design Criteria 

The primary purpose of developing CACS was 
to provide counselors witii a tool for facilitating 
the achievement of a basic counseling objective; 
namely, the selection of the most appropriate 
major for each student. Emphasis is placed on the 
term tool because the current state-of-the-art in 
predicting academic performance by mathematical 
modeling techniques is insufficient by itself, and 
v/e believe that the expertise and experiences of 
academic counselors should comprise the major 
influence on students seeking guidance. However, 
since acadeinic performance is a functior of many 
interrelated variables, the complexities of which 
cannot be precisely calculated and resolved by 
"pure'' counseling, any objective tool that can 
reduce such complexities should facilitate the 
counseling process. The overall design criterion 
used in the development of CACS, therefore, was 
that it be a facilitator in the counseling process. 

To be generally useful, the design of CACS 
would also have to comply with several specific 
and interrelated criteria. Such criteria are briefly 
discussed in the following sections. 

Specific Design Criteria 

1 . Long life expectancy. The time and costs of 
developing a model, such as CACS, necessitates 
that it be designed for a relatively long life span. 
Thus, It was necessary to build into the model a 
modification capability in anticipation of wide 
variety of possible changes. 

2. Ease of expansion. CACS was designed to 
account for current, available input data on each 
student; however, the model was also designed to 
accommodate additional inputs that H'nuld likely 
become available over time. Furthermore, a 
modularized design approach was used to add 



model flexibility, thus, modules can be eliminated, 
added or modified, as required. 

3. Ease of updating. Long life span and flexi- 
bility gain increasing importance as the ease of 
updating the model's program increases. CACS 
would unlikely survive more than one or two 
updates, if the reprogramming tasks were nn- 
wieldly and difficult. Sir:;e many types of changes 
are expected to occur on a relatively frequent basis 
{e.g, modifications of xnajors, chang. 3 in formulae 
for computing GPA), it is apparent that simplicity 
in modifying the program was an essential design 
criterion. 

4. Simplicity of use. The achievement of the 
above design criteria would be offset if model 
usage was too complex oi time consuming. Tius. 
CACS was designed so that it can be ren lay 
employed with little effort on the par* of the 
counselor or other college staff person:. ol. 

5. Timely response, A fina^ design criterion was 
the requirement for ti.pe^' lesponse. Clearly, the 
value of a tool is qu'^ .ionable if its availability is 
low when it is r^^ded. Accessibility to all perti- 
nent information should consume no more time 
than th?t normally occurring between the estab- 
lishment of a counseling appointment and the 
actual coimseling. CACS was, therefore, designed 
to provide, at maximum, an overnight response 
time. 

Representative Applications 
And Examples 

CACS is primarily intend^nt to serve as a tool 
for the academic counselor who is advising a 
student with regard to the selection of an aca- 
demic major. However, it can also provide diagnos- 
tic information for the analysis of student 
academic problems and data for institutional 
research. Samples of each of these three immediate 
CACS applications (academic counseling, diagnos- 
tic evaluation and institutional research) are given 
in the examples below. Additional applications for 
CACS are sure to evolve once the system becomes 
fully operational. 

Academic Counseling Application 

For this example, let us consider the following 
counseling situation. A student is in his third class 
year and has not yet selected an academic major. 
He contacts his counselor for assistance in making 
his selection. If an appointment is made in 
advancf^, the cou-iselor may reqjdest data via CACS 



batch procp'^jing capabflity for review before 
talking ^vith thestudent. Ox, if the studentdid not 
make an advance appointment, or the counselor 
Joes not wish to reviev/ the student's data prior to 
the counseling session, he may request data 
directly from the system via his terminal during 
the c ou nseling session . At the terminal the 
counselor has the option of either CRT (television 
screen display) or hard copy (teletype) output— or 
both. The printouts or displays the counselor 
receives in response to either batch or on4ine 
request are identical in content and format. The 
following is an example of a typical on-line session 
where the cou^^selor enters his request via the 
terminal. 

1. Predictrd CPA option. Suppose student 
number 741234, John D. Brown, has requested 
counseling and has indicated he would like to 
major in mathematics; Vo determine his predicted 
GPA and probability of success in mathematics, 
the counselor enters the following request at his 
temiinal: 

CPAS 741234 MATH 

CACS responds with the predicted GPA, 
standard error, and probability of success. 



Example: 



BROWN J D 






741234 


MAJOR 


EST 


STD 


PROB 




GPA 


ERROR 


SUCCESS 


MATH 


2.55 


.39 


.92 



A second alternative for obtaining this data is 
to request this information in all 28 majors by 
entering the following request: 

CPAS 741 234 

CACS response to this request is a list of pre- 
dicted CPA's in all 28 majors. 
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Exariple; 

BROWN J D 741234 



MAJOR 


EST 


STD 


PROB 




GPA 


ERROR 


SUCCESS 


AERO 


2.80 


.30 


.99 


AMERSTU 


2.64 


.31 


.98 


ASTRO 


2.95 


.27 


.99 


BASSCI 


2.63 


.42 


.94 


CHEM 


2.82 


.43 


.97 


CIVENGR 


3.09 


.37 


.99 


COMPSCI 


2.69 


.42 


.95 


EC ON 


2.67 


.33 


.98 


ELENGR 


3.06 


.35 


.99 


ENGRMGT 


2.72 


.33 


.99 


ENGRSCI 


2.62 


.30 


.98 


FAREAST 


2.52 


.37 


.92 


GENENGR 


2.80 


.36 


.99 


GENSTU 


2.54 


.30 


.96 


GEOG 


2.70 


.45 


.94 


HISTORY 


2.71 


.42 


.96 


HUM 


2.88 


.37 


.99 


IMTAFF 


2.38 


.30 


.90 


LATAMER 


2.71 


.27 


.99 


LIFESCI 


2.98 


.31 


.99 


MATH 


2.55 


.39 


.92 


MECH 


2.99 


.32 


.99 


MILARTSC 


2.95 


.29 


.99 


POLSCI 


2.78 


.35 


.99 


PHYSICS 


2 .82 


.39 


.98 


PSYCH 


2.81 


.38 


.98 


SOVSTU 


2.61 


.35 


.96 


VIESTEUR 


2.43 


.34 


.89 



Two additional printouts are available to assist 
the counselor in advfsing students. These are the 
grade and predictor summaries. 

2. Grade summary option. Grade summaries 
may be requested for a single department oi for 
the student's entire history. For example, if the 
counselor wants to examine the grades in mathe- 
mitics, he could enter the request: 

GRAD 741234 MATH 

CACS response to this re(\uest is a list of 
course.s tr ''^n in the specified department. 
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Example \ 


BROWN J 


D 






741234 


COURSE 




MO, 


HOURS 


GRADE 


MATH 




• 100 


5.50 


X 


MATH 




161 


6.00* 


B 


MATH 




162 


7.50* 


B 


MATH 




232 


2.50* 


C 


MATH 




260 


3.00* 


B 


TOTAL 


HOURS 




19.00 CPA 2.87 



To request a grade summary for the entire 
academic history, the counselor would enter the 
request: 

GRAD 741234 

CACS response to this request is a list of all 
courses the student has taken since entering. (Only 
a few of the first and last courses are shown in the 
sample below.) The printout is ordered by depart- 
ment. Courses flagged with an asterisk (*) beside 
the number of hours are those used in computing 
the total hours for each department and the group 
total shown. 



Example: 



BRCWls J 


D 




741234 


COURSE 




NO. 


HOURS GRADE 


CHEM 




121 


2.50* A 


CHEM 




122 


3.00* A 


TOTAL 


HOURS 




5.50 CPA 4.00 


PHYSICS 




220 


5.00* B 


TOTAL 


HOURS 




5.00 CPA 3.00 


POL SCI 




211 


2.50* C 


POL SCI 




212 


3.00* C 


TOTAL 


HOURS 




5.50 CPA 2.00 


PSYCH 




100 


2.50* C 


TOTAL 


HOURS 




2.50 CPA 2.00 



GROUP TOTAL HRS 141.25 CPA 2.63 



If the counselor wishes to know what p^e- 
dictors are influencing the GPA in a given majc r , 
he can request a predictor summary for that 
major. The predictors in MATH for our sample 
student would be obtained by entering the fol- 
lowing request: 

FRED 741234 MATH 

CACS response to this request is a list of pre- 
dictor values for the specified student. Predictors 
flagged with an asterisk (*) are those used in 
making the prediction for the specified major (in 
this case, MATH). 



Examples 



BROWN J D 7 


41234 


GPA - MATH 


2.76* 


GPA - OTHER BASIC SCIENCES 


3.23* 


GPA - ENG SCIENCES 


3.00* 


GPA - HUMANITIES 


2.38 


GPA - SOCIAL STUDIES 


2.00* 


FALCON/ SKELLY SCHOLARSHIP 


0 


TURNBACK INDICATOR 


0 


ESTIMATE AGE AT GRADUATION 


22 


PRIOR ACADEMIC ACHIEVEMENT 


575* 


VERBAL APTITUDE 


466 


ENGLISH COMPOSITION 


485 


COMPOSITE ENGLISH SCORE 


0951 


MATH APTITUDE 


684 


TNTERMEDIATE/ADV MATH CODE 


1 


MATH ACHIEVEMENT 


698* 


COMPOSITE MATH SCORE 


1382 


ACADEMIC COMPOSITE 


2908 


PAE SCORE 


490 


ACTIVITIES - ATHLETIC 


570 


ACTIVITIES - NON-ATHLETIC 


490 


LEADERSHIP COMPOSITE 


1550 


WEIGHTED COMPOSITE 


562 


MEDICAL QUALIFICATION CODE 


1 


ACADEMY PREP SCH ATTENDED 


0 


OTHER PREP SCH ATTENDED 


0 


COLLEGE ATTENDED CODE 


0 



If a list of the predictors without regai-d to any 
specific major is desired, the counselor can input 
the request: 

PRED 741234 
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In this case CACS will respond with a list of 
predictor values without any asterisks as in the 
example below. 



BROVJN J D 741234 

GPA " MATH 2.76 

GPA " OTHER BASIC SCIENCES 3.2 3 

GPA - ENG SCIENCES 3.00 

GPA - HUMANITIES 2.38 

GPA " SOCIAL STUDIES 2.00 

FALCON/SKELLY SCHOLARSHIP 0 

TURNBACK INDICATOR 0 

ESTIMATE AGE AT GRADUATION 22 

PRIOR ACADEMIC ACHIEVEMENT 575 

VERBi^L APTITUDE 466 

ENGLISH COMPOSITION 485 

COMPOSITE ENGLISH SCORE 0951 

MATH APTITUDE 684 

INTERMEDIATE/ADV MATH CODE 1 

MATH ACHIEVEMENT 698 

COMPOSITE MATH SCORE 1382 

ACADEMIC COMPOSITE 2908 

PAE SCORE 490 

ACTIVITIES - ATHLETIC 570 

ACTIVITIES - NON-ATHLETIC 490 

LEADERSHIP COMPOSITE 1550 

WEIGHTED COMPOSITE 562 

MEDICAL QUALIFICATION CODE 1 

ACADEMY PREP SCH ATTENDED 0 

OTHER PREP SCH ATTENDED 0 

COLLEGE ATTENDED CODE 0 



Diagnostic Evaluation Application 

The CACS system can very readily be used as a 
diagnostic tool by academic counselors. Both the 
on-line end the batch mode of processing are avail- 
able for this purpose. For example, consider the 
situation where the counselor receives a list of 
students who are experiencing academic diffi- 
culties in their major. 

Let us assume a fictitious student in his junior 
year, RJ. Green, student number 741235, is 
having difficulty as an electrical engineering major. 
The counselor can examine his grades in electrical 
engineering courses by entering the following 
request: 



'".xan^ple : 


GREEN R 


J 






741235 


COURSE 




NO. 


HOURS 


GRADE 


EL ENGR 




333 


2.50* 


C 


EL ENGR 




334 


3.00* 


D 


TOTAL 


HOURS 


5.50 GPA 


1.4 



Since the number of specific courses in elec- 
trical engineering is small, the counselor may wish 
to review all the courses the student has taken 
since he entered the school. To obtain this infor- 
mation he enters the request: 

GRAD 741235 

The CACS response to this request is a list of all 
courses taken by the student summarized by 
departments. 



All of the information obtained in the on4ine 
examples, above, could also be obtained by 
keypunching the requests and submitting the cards 
for batch processing. The resulting printouts 
would be identical to the examples above. 
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Example: 



GREEN R J 


741235 


COURSE 


NO. 


HOURS GRADE 


AERO 


331 


2.50* 


D 


AERO 


332 


3.00* 


D 


TOTAL 


HOURS 


5.50 GPA 


1.00 


ASTRO 


432 






TOTAL 


HOURS 


2.50 GPA 


2.00 


CHEM 


101 


2.50* 


c 


CHEM 


102 


3.00* 


c 


TOTAL 


HOURS 

• 


5. 50 GPA 


2.00 


• • 

• • 

LIFE SCI 210 


• 

2.50* 


B 


TOTAL 


HOURS 

• 


2.50 GPA 


3.00 


MIL TNG 


• 

115 


0.50* 


D 


MIL TNG 


116 


0.50* 


C 


MIL TNG 


220 


1.00 


N ' 


MIL TNG 


220 


1.00* 


B 


MIL TNG 


320 


2.00* 


C 


MIL TNG 


320 


2.00 


N 


TOTAL 


HOURS 

• 


4.00 GPA 
• 


2.13 


• 

soc 


• 
• 

304 


• 

• 

0.50* 


B 


TOTAL 


HOURS 


0.50 GPA 


3.00 


SPANISH 


101 


2.50* 


C 


SPANISH 


102 . 


3.00* 


C 


TOTAL 


HOURS 


5.50 GPA 


2.00 



GROUP TOTAL HRS 128.75 GPA 2.06 



The asterisks, which appear oy the class hours 
in the display, indicate that course is used in the 
computation of the GPA. 

CACS can again be utflized to assist counselors 
in discovering solutions to these problems and also 
to assist the counselor in planning a more fitting 
program for the student. Examine the case of the 
student majoring in electrical engineering, as 
mentioned above. He is in his 2nd class year and 
has a limited amount of time before graduation, 
therefore, some changes in his curriculum may be 
suggested to better facOitate his progress. Provided 



the student has sufficient time remaining b^^fore 
graduation and his schedule can be arranged to 
accommodate the necessary subjects, a change in 
academic major may be a potential solution. The 
counselor can examine the likelihood of success 
for the student in other majors by eiitering the 
request: 

CPAS 741235 

The CACS response to this request is a list of 
predicted CPAs and related standard error and 
probabOity of success for the specified student in 
each of the 28 majors. 



Example: 

GREEN R J 741235 



MAJOR 


EST 


c Tr\ 






GPA 


iRROR 


succEs: 


AKRO 


1.92 


.30 


.39 


AMERSTU 


2 


.50 


.31 


.95 


ASTRO 


1 


.58 


.27 


.06 


BASSCI 


2 


.02 


.42 


.52 


CHEM 


1 


-62 


.43 


.19 


CiVENGR 


2 


.23 


.37 


.73 


COMPSCI 


2.09 


.42 


.,58 


ECON 


2 


.17 


.33 


.70 


ELENGR 




.13 


.35 


.65 


FNGRMGT 


2 


.30 


.33 


.82 


ENGRSCI 


1 


.72 


.30 


.18 


FARE AST 


2 


.43 


.37 


.88 


GENENGR 


2.05 


.36 


.56 


GENSTU 


2 


.20 


.30 


.75 


GEOG 


2 


.27 


.45 


.11 


HISTORY 


2 


.56 


.42 


.91 


HUM 


2 


.63 


.37 


.96 


INTAFF 


2 


.23 


.30 


.78 


LATAMER 


2 


.61 


.27 


.99 


LIFESCI 


2 


.61 


.31 


.97 


MATH 


1 


.61 


.39 


.16 


MECH 


1 


.68 


.32 


.16 


MILARTSC 


2 


.46 


.29 


.95 


POLSCI 


2 


.40 


.35 


.87 


PHYSICS 


1 


.61 


.39 


.16 


PSYCH 


2.49 


.3S 


.90 


SOVSTU 


2 


.51 


.35 


.92 


WESTEUR 


2 


.30 


.34 


.81 
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Now, armed with the student's academic 
summary and an estimate of his performance in 
each of the 28 majors, the counselor can analyze 
the potential solutions to the student's problem. 
For example, in the sample case, the student 
shows a fairly low probability of success of .65 in 
his current major of electrical engineering, but a 
fairly high probability of success (.97) in life 
sciences. Also, he received a B in his life science 
courses. Assuming he has sufficient time remaining 
and his schedule can be appropriately arranged to 
complete all requirements for a new major in life 
sciences, a change of majors may be a solution to 
the student's academic problems. 

Institutional Research Application 

CACS may be used as an institutional research 
tool to study trends and characteristics with 
respect to diifferent types of students for . the 
purpose of advancing an understanding of major 
problem areas that may be associated with 
academic counseling. 

The counselor may pursue an investigation of 
this type individually, or he may collaborate with 
other counselors or assistants. The first step the 
counseling researcher undertakes is an outiine of 
the research design applicable to the institutional 
problem. In this step, he executes a simple state- 
ment of the problem, the tentative explanations or 
hypotheses that may apply, and the general tech- 
nique by which data would be coOected and 
analyzed to support or reject given solutions, 
explanations, or guidelines. 

The CACS system offers the counselor an 
excellent capability for doing research concerning 
student academic counseling. This capability 
provides him with: 

1. Easy access to most of the relevant data 
concerning student performance, 

2. Up-to-date comprehensive information for 
individual students or groups of students in which 
he may be interested, 

3. Meaningful formats requiring a minimum of 
data search and conversion, 

4. Hexibflity as to type of data desired, 

5. Rapid response time to allow timely in- 
quiries, 

6. Accuracy of data made possible by effective 
computerization. 

CACS could be used as the primary data collec- 
tion technique, althou^ data from other sources 



may also be included. In pursuing the analysis, the 
researcher defines pertinent study groups by 
stipulating the student control numbers for each 
group in which he is interested and selecting CACS 
options that are appropriate for the study. In an 
exhaustive study all options may be selected, 
thereby, providing a complete printout that re* 
fleets, for the student, his current predicted major 
CPAs and success probabilities, cofurse grades, and 
descriptor variables. The selected student control 
numbers and CACS options may be prepared in 
punch card form for processing by CACS in the 
batch moile. 

Simple but pertinent statistical analyses of each 
study group are performed by accumulating the 
data printed out by CACS. For example, if the 
study concerns the relative mathematical aptitude 
of a given group, the predictor summary for each 
student in that group is scanned for the mathemat- 
ical aptitude indicator which is tabulated or trans- 
ferred to worksheets for accumulation into fre- 
quency distributions, averages, percents, etc. The 
statistics computed during any giyen time period 
for a study group are recorded and compared with 
similar statistics computed for the same kind of 
group during a subsequent time period in order to 
isolate and identify trends. The trends are charted 
graphically using the means, percents or other 
statistics computed from the worksheets within 
any given time period. Group-to-group compar- 
isons may also be made to determine tiiose char- 
acteristics that distingui^ one student group from 
another. Simple graphs reflecting group differences 
can also be made. 

By means of the procedures described here the 
counseling researcher can pursue the answers to 
questions such as: 

L What are the characteristics of military prep 
school students that distinguish them from other 
students and, thereby, may be especially relevant 
in the counseling of those with prep school back- 
grounds? 

2. Are there significant characteristics con- 
cerning course-to-course grade differentials that 
are especially important in the counseling of 
minority groins? 

3. What student descriptor variables are perti- 
nent in the differentiation of students with 
unusual cultural backgrounds such as Vietnamese, 
Chinese » etc? 

4. Does the forecasting of major CPA perfor- 
mance, for students with very hi^ intelligence or 
enriched academic background, create special 
problems in counseling? 
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S. Are counseling and student forecasting 
procedures adequate to meet the needs of female 
students? Although there are no data in CACS at 
present concerning female students, such data will 
be amenable to eady analysis as soon as it becomes 
avaflable in the system. 



II. METHOD 

Introduction 

In developing multiple linear regression equa* 
tions, the deN'eloper usually selects and measures 
input variables, a process which entails the devel- 
opment of test batteries and/or other data collec- 
tion instruments. In the present project, however, 
the input variables were those normally measured 
by the schod. The project did not call for the 
derivation of additional inputs. Since a regression 
model is only as valid as its input data, the pre- 
dictive accuracy of CACS was, therefore, totally 
dependent on the data provided by the schod. 

AvmlaMe Data 

The data provided by the schod consisted of: 
(a) Student Master Tape File (SMTP), containing 
96 items of information accumulated over the 
years 1966 through 1971, and (b) Personnel 
Record Change File (PRCF), containing all grades 
accumulated over the years 1966 through 1971. In 
addition, fractional student data from personality 
and interest tests were also provided by the admin- 
istration. All data were associated with students 
who graduated from the school. 

Scieenhg the Data for Relevancy, 
Sufficiency , and Usability ^ 

Many items in the SMTP were not relevant to 
the construction of a regression model. Such items 
included '^social security number," **advisor 
code," and "cunent parent name." These items 
were eliminated from tlie data pool, resulting in 
the selection of 21 of the 96 original items for 
incluaon in the model. Two additional items, 
student ID number.and major number, were used 
as code numbers for categorizing all data related to 
a given student. 

All grades in the PRCF were used in construc- 
ting the model. Hie grades were analyzed to derive 
10 CPA variables and one criterion variable; 
the major GPA score of the last two school years 
upon which the model was designed to predict. 



Data, related to the CatteU and Edwards 
Inventories, could not be used in the model 
because they were insufficiently distributed, with 
respect to the variables available in the SMTP and 
PRCF, to allow their integration into a multiple 
regression analysis. 

The variables selected and/or derived in the 
model are listed below. (For detailed discussion of 
the variables and associated computations, see the 
Appendix, Mathematical Model Description and 
Maintenance). 

Criterion Variable 

Major GPA last 2 years 

PRCF Derived Variables 

(1) GPA -Math first 2 years 

(2) GPA - Basic sciences first 2 years 

(3) GPA - Engineering sciences first 2 years 

(4) GPA - Humanities first 2 years 

(5) GPA - Social sciences first 2 years 

(6) GPA - Math first year 

(7) GPA - Basic sciences first year 

(8) GPA - Engineering sciences first year 

(9) GPA - Humanities first year 

(10) GPA - Social sciences first year 

SMTP Variables 

(11) Falcon or Skelly scholar^ip 

(12) Turnback indicator 

(13) Estimated age at graduation 

(14) Preacademic achievement 

(15) Verbal aptitude 

(16) English composition 

(17) Total English 

(18) Math aptitude 

(19) Advanced mathematics taken 

(20) Math achievement 

(21) Total mathematics 

(22) Academic composite 

(23) Phyacai aptitude examination 

(24) Athletic activities 

(25) Nonathletic activities 

(26) Leadership compo^te 

(27) Weighted composite 

(28) Medical qualification 

(29) Military prep school 

(30) Other prep school 

(31) College attendance 
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Model Development 

A computerized multiple regression analysis 
approach was used as the major technique for 
developing the equations required in the model to 
predict academic success. (A detailed description 
of model development is contained in the Appen- 
dix.) The validity of such a model is a direct 
function of the size and relevance of the data base 
made Rvailable for statistical analysis. The full 
range of variables available from SMTP and PRC 
data files were considered as independent variables 
for the model. As ojmmarized in the objective 
paragraph, page 9, 31 such variables were consid- 
ered as predictors for the criterion variable, which 
was the major GPA. In simple form, these 31 
variables can be expressed as a vector (or column 
of data) that, when weighted appropriately by a 
parallel vector composed of regression coefficients 
and adjusted for a regression constant, will gen- 
erate a point estimate (Y) with respect to the 
major GPA in a specified field; e.g., chemistry. 
This process can be summarized in the expression 
BX + C=Y; wherein, the Bs and Cs are adjusted 
statistically by the multiple regression analysis 
technique for each major within each student 
Level (A, B ,or C), depending on the unique corre- 
lation of each independent vr^riable with the major 
GPA in past samples of students who have a 
complete performance record; /.e., those who have 
graduated. 

Variables that exhibit characteristics that are 
redundant with other variables or that show poor 
partial correlations are kept in the model for 
future development purposes, but are assigned a 
null regression coefficient that effectively cancels 
out their effects until such time as it is desirable to 
weight them in the model. 

During analysis, 28 different majors were 
observed .'n the data base. A separate expression, 
similar to ihat described above, was developed for 
. each major by three different student maturity 
levels where A represented a student with over 45 

credit hours, B a student with 1545 credit hours, 
and C a student with less than 15 credit hours. In 
all, 84 multiple regression equations were obtained 
and installed in the CACS Prediction Module. Each 
equation was designed to provide a point estimate 
of- major GPA performance for any student being 
counseled. 

Along with the point estimate, the model 
provides a standard error of estimate that could be 
used by the counselor in assessing the relative 



degree of confidence he could place i i the point 
estimate. Initially, the standard error tet.n placed 
in liie model was derived from the equatio. deri- 
vation sample by examining the deviations 
between what the equation said for the sample as 
compared to the actual major GPA. This error 
term may eventually be replaced by a refined 
standard error of forecasting term, obtained by 
similarly examining point estimate deviations but 
for a new sample of students. This potential refine- 
ment is described in the construction of prediction 
equations paragraph, page 14. 

A third capability developed for the model was 
an algorithm for computing the probability that a 
student will be successful in a given field of 
academic endeavor.' Here the point estimate of 
grade performance (previously described) is 
combined with the standard error term and a 
constant representing the minimum permissible 
major GPA performance level at the school (2.00) 
to provide the required probability statement. The 
procedure consisted of expressing the minimum 
acceptable performance as a deviation from the 
point estimate. This deviation is converted into a 
standardized Z-ratio expressed in units of the 
standard error of estimate. This ratio is converted 
via standard normal curve table transformation 
into a probabOity statement that reflects the 
extent to which the student's actual major GPA is 
likely to fall into an acceptable performance 
region. 

In the process of selecting a prediction vehicle 
for forecasting student performance, five types of 
modeling approaches were examined. 

1 . Correlation-based models, including multiple 
regression analysis and the use of joint-occurrence 
matrices 

2. Cluster-oriented models that make use of 
factor analytical techniques 

3. Tree-structure models that treat data in 
sequential foliations 

4. Pattern analysis models where subsets of 
similar performance characteristics are used to 
define properUes of the model 

5. Rational modeling where predictions are 
made on the basis of data-guided estimates made 
by the counselor. 

While evaluation potential modeling ap- 
proaches, it was also necessary to examine the 
amount and characteristics of the data available, 
the availability of central and peripheral computer 



hardware, and the school's overall software config- 
uration. Criteria utilized included: . 

1 . Parsimony of assumptions required for 
model employment 

2. Demonstrated effectiveness in actual appli- 
cation 

3 . Efficiency in all modes of operation 

4. Flexibility and adaptability 

Quster-oriented models would probably pro- 
vide interesting aspects of student performance 
that could be expressed as major themes or factors 
in academic work, but they would not have solved 
the direct requirement for adiieving a visible 
estimate of academic performance. Thus, they 
were eliminated from consideration in the initial 
development phase. 

Decision tree structural models, where students 
are analyzed on branches of an analytical tree, 
provide yet another potential and interesting 
recourse to investigation of student data. However, 
they tend to duplicate the results of the multiple 
regression approach, while requiring considerable 
sophistication in data manipulation on the part of 
the analyst. They also tend to require elaborate 
computer'capacity; thus, they were excluded from 
consideration for CACS. 

Pattern analysis models were conadered too 
difficult to achieve in the time allowed, and they 
also typically lack the visible forecasting capability 
provided by multiple regression analysis, llie cost 
of these models is difficult to control and usually 
runs to excessive proportions. 

sRational modeling will be used to some extent 
by all counselors regardless of the formal model 
used in the computer system. No formal attempt 
was made to structure this type of model since it 
was considered impractical to collect, specify, and 
propam all the different possible configurations 
that different counselors employ in their charac* 
teristic approaches to student guidance. 

Based on a rigorous examination of the five 
types of modeling approaches utilizing the criteria 
identified above, the multiple regression analysis 
approach was selected as the one that would (a) 
provide the greatest initial and long-term benefits, 
(b) be the easiest to install and maintain, (c) lend 
itself best to modification and expansion, and (d) 
be the most cost-effective. ^ 



Computer Program Development 

CACS program modules were designed around 
the basic function of the system utilizing one 
module per function. The four modules and their 
respective functions are: 

1. Control module 

2. Data retrieval module 

3. Prediction Module 

4. Data base extraction nK)dule 

This modidar approach not only simplified 
program development and maintenance, but also 
makes use of the overiay capability of the B-3500 
COBOL compiler to reduce the amount of 
computer storage required to operate the system. 

Data that are likely to require frequent updates, 
such as predictor values, are allocated to external 
files that are easily modified by the data base 
extraction module. This module maintains a set of 
values that accurately describes the current en- 
rollment at the school. 

System commands have been kept simple and 
direct to enable a user to quickly learn the system. 
Outputs for display and hard copy terminals were 
designed to present data in a concise and meaning- 
ful way. AU counseling aids commands in the 
system are available at the remote terminal and as 
a batch job via card input at the computer, to 
provide the user with optimum methods of re- 
trieving data. For example, a counselor can request 
faiily large amoijnts of data via batch processing 
before a counseling appointment with a student, 
and/or retrieve additional information online 
from the computer via his remote terminal during 
the interview. 

In general, CACS has been designed as a 
straightforward and usable tool for academic 
counselors. This report has deliverately avoided 
complicated system operation procedures, there- 
by, reducing training and maintenance efforts 
while increasing utility. 

Model Capabilities 

The capabilities of the four CACS program 
modules are presented below. (A detailed descrip- 
tion of the modules is reserved in the project fde 
forIUR-00-31.) 
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Contfol Ptognun Module 

This program module, performing the executive 
function of the program system, ensures the 
parameters required for CACS interface with the 
school's Burrou^s B-3500 computer system. 

Prediction Ptogtam Module 

This program module contains the equations 
necessary for computing achievement predictions 
for a given student. It represents the core of CACS 
and is designed to execute all forecast and sum- 
marizing confutation and output resulting data. 
The prediction program module was designed to 
provide the three outputs noted in the genera! 
paragraph, page I; Le., probability of success in 
major field, GPA point estimate, and a standard 
error associated with the point estimate. The three 
output predictions are determinable for each 28 
major fields of study. Thus, a counselor can 
acquire a total predictive profile consisting of 28 
GPA predictions, 28 standard errors, and 28 
success probabilities on a given student for all 
curricula. If desired, a printout of all predictor 
values used in the derivation of the predictions and 
all academic grades can be executed. 

Data Retrieval Program Module 

The data retrieval program module retrieves the 
required student data from the CACS data base 
stored on the disk. This model operates in con- 
junction with the prediction program module and 
does not require user intervention. 

Data Base Extraction Program Module 

Hiis program module is used in the mainte- 
nance and creation of the CACS data base which is 
subsequently stored onto disk for use by the 
CACS system. 



Specific Applications 

Three sets of equations were designed for each 
model-Level A, B, and C. Every effort was made 
to maximize the utility of each level. The project 
investigators recognized that the eadier in the 
academic career of a student that achievement 
predictions can be acquired, the ^eater the utOity 
and effects of counsding guidance. It should be 
recognized that a concomitant of early application 
is decreased model validity, due to fewer input 
data and lesser quality of data. Such an outcome is 
Q inevitable by the very nature of predfetion models 



and was an hypothesized expectancy. It is stressed 
here to ensure that the user be cognizant of the 
model's limitations in providing counseling guid- 
ance information. 

Brief Summary of Pioject 
Devdopmental Tasks 

The CACS project included six distinct, but 
related tasks: 

1 . Conduct an alternate system study 

2. Design, develop, and test a piediction model 

3. Design, develop, and test real-time computer 
programs 

4. Procure, integrate, and deliver display and 
hardcopy equipment 

5. Provide documentation and briefings 

6. Deliver, install, and validate the system 

Task 1 - Conduct An Alternative 
System Study 

A complete description of Tadc 1 and resulting 
recommendations, is reserved in the Project file for 
ILIR-00-31 concerning an Analysis and Evaluation 
Alternative Computer System Configurations for 
the Computerized Academic Counseling System. 
This task invdved a thorough description of 
system requirements, an analytical discussion of 
each of three alternative computer systems that 
could be used for CACS, cost comparisons of the 
three alternatives, and rationale for selecting a 
specific alternative. 

Task 2 - Design, Develop, and Test 
APiedktionModel 

Task 2 is thoroughly described in the appendix. 
Tliis task involved the development of the selected 
mathematical model used in the derivation of 
prediction equations and of methodology required 
for updating these equations as additional data 
become available. 

Tasks 1 and 2 were performed concurrently to 
provide timely design data for use in Tasks 3 and 
4. 

Task 3 • Design, Develop, and Test Real-Time 
Computer Piogiams 

Task 3 is documented in detail in the project 
file for ILlR-00-31. It consisted of developing: (a) 
necessary computer programs (b) specifications for 



operating the system (c) step-by-step procedures 
for using CACS and (d) methodology for main- 
taining the integrity of the system. 

The design philosophy employed emphasized a 
modular approach and the utilization of existing 
or standard computer facilities. This philosopiiy 
provides for the most efficient use of system 
resources and facilitates updating, modifying and 
expanding the system to keep pace with academic 
counseling requirements and changing computer 
facilities. 

Task 4 • Procure, Integrate, and Deliver 
Display and Hardcopy Equipment 

Task 4 involved the acquisition and integration 
of display and hardcopy equipment for two differ- 
ent terniinal configurations to be used in four 
CACS counselor positions. Both configurations 
consist of a cathode ray tube display terminal, 
hardcopy printer and a data set. The same data 
sets are used in both configurations, Omnitec 
701A Acoustic Telephone Couplers. These cou- 
plers are capable of operating over either leased or 
dialed lines and interface with both Teletype 
(TTY) and RS232 terminal devices. 

The first CACS terminal configuration consists 
of a Data Point 3000 Display Terminal, a 
Teletype Model 33 RO (Read Only) for hardcopy 
output^ an Omnitec 701A Acoustic Coupler. The 
second terminal configuration is composed of a 
Teletype Model 33KSR (keyboard send/receive), 
and Ann Arbor 202 display terminal with a 9-inch 
video monitor, and an Omnitec 701A Acoustic 
Coupler. 

Each of the terminal configurations is capable 
of being operated over either leased or dialed 
telephone lines, and with or without, hardcopy 
output. System operations and test procedures for 
CACS are given in the project file for ILIR-00-31, 
CACS Installation and Test. 

Task 5 " Provide Documentation and Briefings 

This task is self-evident; the present technical 
report and documentation, cited elsewhere within 
this rep.:rt, constitute the products of this task. 

Task 6 - Deliver, Install, and Validate the System 

Task 6 consisted of: (a) exercising the model 
using the criterion input GPA variable derived 
from the class of 1972 throu^ the fall semester of 
1971, (6) computing tho multiple linear regression 



equation predictions, and (c) subsequently valida- 
ting the derived equations. All subtasks have been 
completed. 



in. RESULTS 

Introduction 

Data for 3234 graduated students concerning 
31 potential predictor variables with regard to 
major GPA were extracted from the CAIDS-PRC 
files for the time period 1967 through 1971. Due 
to technical difficulties associated with different 
tape design, 1966 student information could not 
be incorporated into the basic data. A study of the 
sample sizes for the majors, involved in 1966, 
indicated that the omission of this data would not 
seriously disturb the equations to be derived. 
Punched card data, concerning Cattell and 
Edwards personality information for students 
during the time period 1968 throu^ 1971, were 
received from the school, but these could not be 
properly integrated with the SMTF-PRC files for 
multiple regression analysis purposes. 

Construction of Prediction Model 

Data for 3234 graduated students were parti- 
tioned by major areas of study across graduating 
years as shown in Table 1. The graduates in each 
major were combined into sample study groups. 
Frequency distribution and variance analyses were 
conducted across all majors for the 31 numerical 
variables extracted from the SMTF-PRC files. 
These variables are summarized in the screening 
paragraph, page 8. The detailed logic for their 
derivation is explained in the appendix. 

The variance ratios examined were statistically 
significant. This was particularly true for major 
GPA, indicating that inter-major differences were 
sufficient to preclude grouping of students into 
larger groups for the purpose of predicting grade 
performance. Therefore, separate multiple regres- 
sion equation models for each major were derived. 
In addition, three levels of student maturity, at 
which the CACS system would operate in making 
grade point average forecasts, were established. 

These levels were as follows: 

Level A - the student has over 45 credit hours. 

Level B- the student has 1545 credit hours. 
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Table 1. Distribution of Graduated Students by Years and Majors 



MAJOR 


CODE 


1^0/ 


1 f> Q 
ly DO 


1 Q 


19 70 


1 Q "7 1 


TOTAL 


HUMAN 


01 


10 


8 


5 


5 


2 


30 


BASSC 


02 


103 


48 


25 


17 


6 


199 


ENGSC 


03 


40 


53 


37 


15 


16 


161 


INTAF 


04 


87 


92 


55 


74 


50 


358 


MILSC 


05 


8 


6 


10 


5 


2 


31 


MATH 


06 


42 


35 


20 


18 


26 


141 


ASTRO 


07 


32 


49 


C 1 

51 


30 


25 


187 


HIST 


08 


10 


18 


21 


25 


36 


110 


ENGMG 


09 


63 


74 


83 


59 


92 


371 


CVENG 


10 


24 


34 


47 


40 


34 


179 


ELENG 


11 


12 


20 


17 


20 


27 


96 


ENGMG 


12 


8 


9 


15 


22 


42 


96 


CHEM 


13 


13 


9 


11 


9 


7 


49 


PHYS 


14 


10 


17 


11 


25 


17 


80 


ARENG 


15 


14 


22 


50 


60 


47 


193 


PSYCH 


16 


8 


19 


17 


13 


16 


73 


ECON 


17 


12 


14 


31 


31 


23 


111 


POLSC 


18 


0 


11 


9 


27 


5 


52 


GEOG 


19 


0 


2 


14 


12 


7 


35 


AM STD 


20 


0 


1 


8 


3 


4 


14 


GNSTD 


21 


19 


46 


74 


126 


64 


329 


GNENG 


22 


2 


15 


16 


1? 


27 


72 


CPTSC 


23 


0 


0 


23 


24 


42 


89 


LIFSC 


24 


0 


0 


0 


33 


49 


82 


FESTD 


25 


0 


0 


3 


11 


6 


20 


LASTD 


26 


0 


0 


11 


16 


12 


39 


SVSTD 


27 


0 


0 


2 


7 


9 


18 


WESTD 


28 


0 


0 


8 


6 


5 


19 


TOTAL 




517 


602 


678 


745 


692 


3234 
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Level C • the student has less than 1 5 credit 
hours. 

Modeling by levels necessitated the construc- 
tion of 84 different multiple regression equations 
for installation in the prediction module of the 
system. 

Construction of Prediction E q uati ons 

Each of the 31 potential predictor variables was 
statistically evaluated for inclusion in each of the 
84 equations. On the average, about 5 computer 
runs were required for each regression analysis or a 
total of about 400 runs. The detailed technical 
procedures, by which predictor variables were 
screened and selected, is described in the appen- 
dix. The analytical data and results used in 
equation development appear in the project file 
for ILIR-00-31. 

Level A Model 

The results of the Level A Model development 
are summarized in Table 2, which shows in column 
2 the multiple correlation for each major. By 
squaring this value, the specific amount of statis- 
tical efficiency (100 X R^) obtained in the data 
sample for each equation. 

The statistical efficiencies range from 29 
percent for Computer Sciences majors to 70 
percent for American Studies majors. The effi- 
ciency for the latter group is believed to be 
inflated by the small sample size of 14 students on 
which that equation was based. However, the 
efficiences for Engineering Mechanics and Inter- 
national Affairs majors are quite high (64 percent, 
62 percent) and are not considered to be inflated 
by small sample sizes. The efficiences for other 
majors are correspondingly high. The overall 
percent of performance variance accounted for by 
all equations at this level is 49 percent, which 
compares favorably with most similar studies of 
university performance. This supports the con- 
clusion that the mathematical Level A Model is 
generally quite efficient in forecasting major GPA 
performance. 

Table 2 also shows the relative percent statis- 
tical contribution of each selected predictor 
variable to the forecasting efficiency of each 
equation. The bottom row of Table 2 exhibits the 
summary impacts of each predictor across all 
majors. Variables are arranged from left to right in 
ascending order of impact on major GPA. Thus, it 



can be observed that the largest impact occurred 
for Social Sciences GPA (14 percent), while the 
least impact occuired for English Composition (.3 
percent). 

In summary. The Level A Model relies heavily 
on early GPA revealed during the first two years at 
the school but includes some weight (about 2 
percent) for available pre-college variables. The 
Level A Model should reflect a high degree of 
utility for counselors, when they are working with 
students at the critical decision point in their 
academic life; z.e., when students must choose a 
major field of endeavor. 

Level B Modd 

The results of the Level B Model development 
are summarized in Table 3. Here, the statistical 
efficiencies range from 14 percent for Computer 
Sciences to 70 percent for American Studies. The 
efficiencies for the three top majors are probably 
inflated by small sample size and should be dis- 
counted. However, the efficiences for Interna- 
tional Affairs (51 percent) and Mathematics (50 
percent) are not inflated by small sample sizes and 
can be regarded as being hi^ly respectable in the 
forecasting realm. The statistical efficiencies for 
the other majors are lower, but still considered 
quite respectable for prediction purposes. The 
overall efficiency of the Level B Model was 38 
percent as compared to the 49 percent observed 
for Level A. This clearly indicates that the Level A 
Model is superior and should be used whenever 
possible to assess future student performance. 

The availability of the Level B Model enhances 
the range of operational utility for CACS, since it 
can be applied earlier than the Level A Model. 
Considering that the model currently has no access 
to motivational factors, occupational interest, or 
specific r^otitudes, the 38 percent overall efficiency 
appears lo be quite acceptable for use in coun- 
seling at an intermediate point in the students 
academic life; i.e., sometime prior to the time that 
a major must be selected. 

As in the previous table, the selected predictor 
variables in Table 3 are arranged in ascending order 
of impact. For the Level B. Model, the largest 
impact across all majors occurred for Humanities 
GPA (10 percent) while the least occurred for 
Mathematical Aptitude (1 percent). 

In summary, the Level B Model relies heavily 
on first year GPA and includes an increased weight 
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Table 2, Academic Performance Prediction Model 
^Maturity Level A*) 



Pre-College Predictors Ath and 3rd Class Yiinr Prodictorf^ 



MAJOR 








Verb 


Math 


Math 


PRIOR 


CPA 


GPA 


OPA 


GPA 


CPA 


ABHREV* 




N 


Comp 


Apt 


Acht 


Act 


ACAD 


MATH 


BAS SC 


ENG SC 


HUMAN 


SOC SC 


AM STD 


.84 


14 










34 








27 


9 


KNG METH 


.80 


96 












11 


27 


IB 


5 


3 


INXAF 


. 79 


353 




3 






4 


2 




3 


19 


31 


MATH 


. 77 


141 






6 






25 


6 


3 




15 


LA STD 


.77 


39 










17 








14 


28 


ASTRO 


.76 


187 






1 






13 


15 


29 






F>: STD 


.75 


20 










6 








25 


26 


ARIiNG 


.75 


193 












6 


8 


27 


5 


10 


ENG SO 


.73 


161 








1 


3 


13 


7 


21 




8 


KCON. 


.73 


111 




2 










15 


4 


9 


23 


ENG MGT 


.71 


371 








1 






6 


7 


6 


31 


LIF SC 


.71 


82 


4 












20 






26 


PSYCH 


.70 


73 












17 






13 


19 


GEOG 


.69 


35 








5 






6 




12 


25 


POL SC 


.69 


52 










6 


4 


9 




11 


17 


MIL SC 


.69 


31 










7 


11 




18 




11 


CIV ENG 


.68 


179 












5 


14 


15 


6 


6 


GEN KNG 


.66 


72 








2 


11 


11 


2 


16 




A 


PHYSICS 


.67 


80 






4 






10 


20 


4 


4 


3 


HUMAN 


.66 


30 


3 












2 


6 


26 


7 


EL ENG 


.66 


96 








9 






4 


25 




5 


WE STD 


.64 


19 










9 








28 


4 


HIST 


.63 


110 












5 






21 


14 


CllKM 


.62 


49 




5 


4 






2 


25 




3 




BAS SC 


.61 


199 










6 




12 


4 


9 


4 


GEN STD 


.59 


329 










A 




5 


4 


8 


14 


SV STD 


.57 


18 










5 








3 


25 


OPT SC 


.54 


89 












3 




17 


A 


5 


AVERAGE 
























14 


MULTIPLE 


.69 




.3 


.4 


.6 


.7 


4 


5 


7 


8 


9 



R 



*Level A Model is based primarily on predictor data compiled during students' 
first two years at the Academy during the time period 1967 through 1971, 

**Represents the multiple correlation coefficient of the predictor variables with 
major GPA. R was squared and multiplied by 100 to obtain the figures discussed 
in the text* 
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Table 3. Academic Perforinance Prediccion Model 
(Maturity Level B*) 



Pre-College Predictors Ath Class Ye ar i'reflLCCors 

AfiiREV "^TH ENGL VERB MA^ PRIOR GPA CPA CPA CPA 
R** N APT COMP APT ., AQil ACAB M™ SOg Sg 

23 

23 25 

31 16 

e 10 1:3 

£ 9 4 

5 25 
14 21 3 
11 12 U 

6 22 iO 

7 13 

4 13 
7 7 19 

2 24 

3 6 4 

5 14 

3 10 6 
6 13 

10 

11 13 
9 10 
9 10 : 
19b 

'..4 

4 4 10 
2 4V 
2 



AVERAGE 
MULTIPLE .61 



AM STD 


.84 


14 






14 




33 




LA STD 


.77 


39 










12 




KK STD 


.77 


20 




7 






5 




INTAF 


.71 


358 






5 




7 




MATH 


.71 


141 








7 


5 


17 


KNG MECH 


.69 


96 










2 


16 


Lir SC 


.69 


82 




4 








6 


ECON 


.66 


111 






5 




3 




GKOG 


.65 


35 


4 












i'OL SC 


.64 


52 










6 




ASTRO 


.63 


187 








4 




19 


ENG KGT 


.62 


371 


2 








1 


2 


WE STD 


.62 


19 










12 




ENG SC 


.61 


■ 161 


4 








4 


16 


PSYCH 


.60 


73 












17 


ARENG 


.59 


193 








3 




11 


MIL SC 


.59 


31 










8 


6 


r.N ENG 


.58 


72 


3 








16 


15 


KL h;:g 


.57 


96 


12 








3 


8 


HIST 


.57 


110 






1 




1 


5 


CV ICNG 


.57 


179 












5 


PHYSICS 


.57 


80 








4 




6 


MS SC 


.54 


199 










10 


1 


SVSTD 


.53 


18 




10 






4 




GKSTD 


.51 


329 






2 




6 




HUMAN 


.48 


30 




8 












.46 


49 




1 


5 


9 




4 


cn SC 


.37 


89 












5 



*Level B Model is based primarily on predictor dota compiled during students' 
first year at the Academy during the time period 1967 through 1971. 

**Represent8 the multiple correlation coefficient of the predictor variables 
with major GPA. R was squared and multiplied by 100 to obtain the figures 
discussed in the text. 



16 



(about 8 percent) for gvaflable pre-coUege pre- 
dictors. 

Level C Model 

The results of the Level C Model development 
are summarized in Table 4. The statistical effi- 
ciencies for this model range from 2 percent for 
Computer Sciences to 62 percent for American 
Studies. The latter figure can be discounted due to 
the small sample on which it was based. The 
overall efficiency of the Level C Model was 19 
pert jnt, which was about 50 percent below that of 
Level B and about 60 percent below that of Level 
A. This is a considerable drop in statistical effi- 
cV;ncy and some of the lower powered equations 
niay be questioned, with respect to their validity 
in assessing student future performance at the time 
he first arrives at the school. However, considering 
the sharp gains available from the Level A and B 
Models and the generally better than chance 
efficiency of the Level C Model, the three models 
together provide good capability for use in pro- 
gressive counseling. 

The state-of-the-art in student career fore- 
casting will be improved significantly as CACS is 
applied in counseling. There is little doubt that 
additional improvements can and wiil be possible 
after the system first becomes operational. These 
improvements will extend to the three maturity 
models described here as well as other models that 
can be incorporated into the approach. A dis- 
cussion of several improvement possibilities is 
provided in the recommendations paragraph, page 
26. 

?r f p School Effects on the Prediction M odel 

A special investigation of the 1967-1971 
graduate data compiled during model development 
was conducted for the purpose of examinjig 
predictor-criterion differentials for students who 
attend prep schools. 

The students were divided into five groups as 
follows: 

1. No previous college or prep school 
(7V=2360) 

f 2. Mitaiy prep school (N-SOl) 

3. Other prep school (A^=65) 

4. Previous college (A^=457) 

5. Not classifiable (A^=51) 

O 

ERIC 



Within each group, grade performance was 
partitioned into the following subject matter 
areas: 

1 . Basic sciences (other than mathematics) 

2. Social sciences 

3. Engineering sciences 

4. Humanities 

5. Mathematics 

Separate grade point average compilations were 
made for the freshmen and the freshmen plus the 
sophomores. The tabulations derived from this 
analysis are shown in Table 5, where it indicates 
that the military prep school attendees were 
clearly inferior in performance to the non-preps in 
every respect but one-freshman Mathematics. In 
this subject matter area, the military preps actually 
exceeded the non-preps; however, the difference 
was not statistically significant. The math GPAs 
for other prep school attendees did not stand out 
in this respect as much as those for military preps; 
however, they also are inflated in relation to the 
other subject matter GPAs. 

Data for students who had no previous college 
prep school and those who attended military prep 
school were analyzed still further by partitioning 
among majors who had enough prep school at- 
tendees to allow meaningful comparisons to be 
made. The results of this analysis are shown in 
Table 6. The same general pattern as that observed 
in Table 1 prevails. For all but a few majors; i.e., 
Cv/ii Engineering, Computer Sciences, and Elec- 
trical Engineering, the preps surpass the non-preps 
in Mathematics during the freshman year. The 
difference between tlie preps and non-preps is 
substantially narrowed during the sophomore year. 
This can be observed in the mean differences u. 
GPAs for the freshman and sophomore years, as 
compared to the sophomore year alone. 

In every other subject matter area, the prep 
school graduates are clearly and consistently 
inferior to the non-prep graduates during the 
freshman and freshman plus sophomore years. 
This inferiority also extends to major GPA per- 
formance during the junior and senior class years. 

The graduates who attended a previous college 
compare very favorably with those who had no 
prep school attendance. Consequently, these two 
types of students appear to be mergable for 
purposes of model derivation. 



Table 4. Academic Performance Prediction Model 
(Maturity Level C*) 



Pre-Co!!ege Predictor 



























intr/at>v 


FST AGE 


MATH FNGT 


MATH 




.* Ulu 1 


i\n l> Ixr* V • 






MATH 




APT rHMP 


ArviT 

J. 






AMSTD 


.79 


14 




1 


24 




12 


25 


XN'TAF 


.58 


358 




3 


5 




11 


15 


LASTD 


.57 


39 










6 


27 


MATH 


.5A 


141 




1 


2 1 


13 




12 


r-ESTD 


.5A 


20 






c 




7 


17 


iiLKNG 


.50 


96 


3 


1 


15 






6 


GNUNG 


.50 


72 






4 






21 


I-CGM 


.48 


111 












11 


Mil. SC 


.A8 


31 


3 






4 


c 


11 


am 


.46 


49 






2 


13 


0 




PHYSICS 


.46 


80 


6 


2 




11 




2 


SVSTD 


.42 


18 




6 


6 






6 


Wl-STD 


.42 


19 








12 




6 


HAS SC 


.42 


199 




1 








16 


;-:NC SC 


.40 


161 




2 


5 






7 


CMOG 


.40 


35 






6 






IG 


ASTKO 


.37 


187 




3 


3 


5 






i-NG MOT 


.37 


371 






2 


1 


Ji 


6 


:»0t. SC 


.37 


52 






1 






1^ 


c;n std 


.37 


329 








1 


J 


10 


iilST 


.36 


110 




2 






0 


5 


,iuMAN' 


.35 


30 




1 


10 








UF SC 


.33 


82 




2 


5 




X 


-> 


AK ENC 


.32 


193 


1 






5 




4 


CV ENG 


.30 


179 




2 


1 






6 


i:ng MECH 


.28 


96 












7 




.20 


73 




2 








2 


CPT SC 


.14 


89 








1 




1 



AVERAGE 

MULTIP'-E .42 
R 



*Level C Model is based on pre-coilege data for students, collected during 
the time period 1967-1971. 

**Represents the multiple correlation coefficient of the predictor variables 
with major GPA. R was squared and multiplied by 100 to obtain the figures 
discussed in the text. 
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The conclusion derived from these comparisons 
was that the grade point average in the mathe- 
niatics subject matter area for 4th class year 
students, who had attended the military prep 
school, were higher than those of other students 
with similar abflities. This elevated the grade point 
average in mathematics, though reduced during the 
sophomore year, appears to carry through both 
years. Because the use of these elevated grade 
point averages lead to a predicted performance 



Bfislc Sciences 

Social Sciences 

Engineering Sciences 

Humanities 

Mean GPA - Non-Ma 

Mean GPA Math 

Correction 



Thus, the model reduces the Math 4 GPA by 
.37 in Level B and the Math 4+3 GPA by .29 in 
Level A before using these var'ables in predicting 
the major GPA for military prep school attendees. 



Basic Sciences 
Social Sciences 
Engineering Sciences 



level for these students that tended to be higher 
than that actually experienced in the junior and 
senior class years, several adjustments were made 
to the forecasting model for predicting major GPA 
performance. 

The first adjustment involved a correction in 
the Mathematics GPA used by the model for 
predicting major GPA performance for students 
who attended military prep school. This adjust- 
ment was computed as follows: 





FRESHMAN 




AND 


FRESHMAN 


SOPHOMORE 


YEAR 


YEARS 


MEAN GPA 


MEAN GPA 


2.48 


2.46 


2.51 


2.56 


* 


2.59 


2.67 


2.64 


2.55 


2.56 


2.92 


2.85 


- .37 


- .29 



For non-military prep school attendees, a small 
adjustment was also made as follows: 





FRESHMAN 




AND 


FRESHMAN 


SOPHOMORE 


YEAR 


YEARS 


MEAN GPA 


MEAN GPA 


2.49 


2.54 


2.60 


2.60 


ic 


2.43 
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Humanities 

Mean GPA Non-Math 

Mean GPA Math 

Correction 

The asterisk indicates that only 83 out of 301 
(28 percent) of tlie military prep school attendees 
and 17 out of 65 (26 percent) of non-military prep 
school attendees had taken any Engineering 
Sciences courses during the fresliman class year. 
This vas not considered a sufficiently adequate 
sampling of this variable to include in the correc- 
tion calculations for the freshman year. 

Thus, the mode! reduces the Math 4 GPA by 
.10 in Level B and the Math 4+3 GPA by .10 in 
Level A before using tliese variables in predicting 
the major GPA for non-military prep school 
attendees. 

Although the Engineering sciences GPA also 
was out of line for military prep school attendees 
during the freshman year, a correction for this 
condition was not made because this variable is 
not currently used in making Level B predictions. 
The data for this variable are not available on 
enough students to consider it for this purpose. No 
adjustment is required for previous college 
attendees or non-preps because they arc consid- 
ered as normative students for the purpose of 
statistical model construction. 

Since the er^uation derivations were all com- 
pleted before the prep school analysis was 
CLTipleted, the prep school attendees were in- 
cluded in the calculations that produced those 
equations. Therefore, the equation coefficients 
assigned to Math 4 + 3 GPA in Level A and to 
Math 4 GPA in Level B may be somewhat dis- 
torted. Other equation coefficients in these models 
might also have been influenced by the inclusion 
of these non-normative students in the mod3l. The 
initial model was, however, mainly determined on 
the basis of regular students using numerous non- 
math predictor variables so that the amount of 
bias due to the prep school predictor-criterion 
differential should be small. 





FRESHMAN 




AND 


FRESHMAN 


SOPHOMORE 


YEAR 


YEARS 


MEAN GPA 


MEAN GPA 


2.70 


2.65 


2.60 


2.56 


2.70 


2.66 


- .10 


- .10 



Wlien the model is updated, all prep school 
attendees should be deleted. If this should affect 
sample size for certain majors such that it dips 
considerably below 50, an alternative approach 
should be used. That is, the prep school attendee 
should be retained in the statistical calculations for 
the given major, but reduce his Math 4 and Math 
4 + 3 GPAs by the appropriate corrections de- 
scribed previously. 

The Math GPA adjustments described above 
could disappear entirely or become larger v^ith 
succeeding classes of graduates. The statistical 
procedures previously described should be re- 
applied periodically to monitor this condition and 
to make further adjustments accordmgly. 

CACS Validation 

As previously noted, the CACS prediction 
equations were developed using data available on 
3234 students who graduated during the time 
period 1967 through 1971. These equations were 
subsequently valrlated on an independent sample 
of 754 students who graduated in 1972. The 
purpose of the validation procedure was not only 
to assess the accuracy of the CACS model, but also 
to determine the confidence which counselors 
could place in the model's predictions. 

It will be recalled that the model generates 
predictions for each of 28 majors at three levels of 
student maturity. The maturity levels are: 

Level A - applicable for students that have 
earned over 45 credit hours. 

Level B - applicable for students that have 
earned 1545 credit hours. 

Level C - appHcable for students that have 
earned less than 15 credit hours. 
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Validation Procedure 

The validation procedure consisted of gener- 
ating upper division major GPA predictions for the 
754 graduating students and comparing such 
predictions v/iih the students' actual earned CPAs. 
It was originally planned to use grades accumu- 
lated over the last four semesters for computing 
earned CPAs. However, since final semester grades 
were not available when the validation test oc- 
curred, earned CPAs were, therefore, based on the 
first three semesters of upper division work. 

The comparison of earned versus predicted 
GPA was made for each major that contained at 
least one graduate. Only the American Studies 
major failed to qualify since it had no graduates in 
1972. Twenty-seven, rather than 28, prediction 
CPAs were, therefore, generated for each of the 
model's maturity levels. 

It was apparent from previous analyses that the 
statistical efficiency of the mcdcFs equations 
would vary from major to major and from 
maturity level to maturity level. The validation 
analysis was specifically aimed at assessing this 
differential prediction capability so that coun- 
selors could recognize and use the predictive 
potential of the model to the best advantage. 
Additionally, the validation analysis was expected 
to provide valuable insights regarding avenues for 
improving the model. 

Results of Maturity Level A Validation 

Table 7 summarizes the results of comparing 
the observed (earned) versus predicted CPAs 
across all 754 students for maturity level A. The 
average observed GPA was 3.03; whereas, the 
average predicted GPA was 3.02. This indicates 
that the distribution of predicted values closely 
paralleled the distribution of observed values, 
hence the central tendency calibration of the 
model was excellent. 

Table Z Observed and Predicted 
CPAs for 754 Students 
(Maturity Level A) 



Student 
1 



754 

Overall 
Average 



GPA 
Observed 

2.48 



00 



3.03 



GPA 
Predicted 

2.67 



3.32 
3.02 



Individual 
Differen ce 

19 



32 
28 



Central tendency calibration is a complex 
resultant of numerous statistical distributions 
having an influence on the modal. Usually, a 
model such as CACS will consistently under- 
estimate or overestimate actual performance with 
the result that the average predicted value will 
deviate significantly from the average observed 
value. However, such was aot the case for the 
CACS Uvel A Model. 

The last column of Table 7 shows a second 
attribute of model validity. The difference 
between o bserved and predicted GPAs was 
computed for each student and then all 754 differ- 
ence scores were averaged (without regard to plus 
or minus signs). The average difference score was 
found to be 0.28 of a GPA, indicating that Model 
Level A provided a high degree of intrinsic accu- 
racy. Such accuracy it may be noted, is independ- 
ent of central tendency calibration. That is, it is 
possible to have high intrinsic accuracy even 
though central tendency calibration may be quite 
low, and vice versa. 

The distribution of forecasting errors obtained 
for maturity Level A is summarized in Table 8. 
The median forecasting error was about 0.23 of a 
GPA (as contrasted with the mean of 0.28), while 
the 75th percentile error registered about 0.38 of a 
GPA. Thus, the bulk of the forecasts fell into the 
low error category. Only '33 (18 percent^ stu- 
dents out of 754 had forecasting errors that might 
be considered large (i.e., 0.46 of a GPA or higher). 
(The code numbers for these 133 students have 
been extracted and preserved for potential follow- 
up analysis. In such an analysis, all other infor- 
mation available concerning these students would 
be researched in order to determine possible 
causations for these errors.) 

Comparisons were also made between observed 
and predicted GPAs for each of 27 majors 
(Table 9). Columns three and four of Table 9 show 
the average observed and average predicted GPAs. 
The overall averages were 2.98 and 2.98, respec- 
tively, again indicating high central tendency 
calibration. Column five presents the differences 
between obs'^rved and predicted GPAs, and the 
overall average difference is shown as only 0.10 of 
a GPA. However, the most important measure of 
intrinsic accuracy consists of averaging difference 
scores for all students within a major (column 6). 
The overall average of these scores is shown as 
0.27 of a GPA, almost identical to the intrinsic 
accuracy score obtained in the previous analysis of 
individual students (Table 7). 

Some of the majors had relafively few students, 
thus, yielding small sample sizes. Th'^se are 
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Table 8. Distribution of Differences Between ObserveU 
and Predicted CPAs for 754 Students 
(Maturity Level A) 



FREQUENCY 



100 7 



(EQUATION A) 



MEDIAN (50t) 

MEAN 



— 75% OF ALL STUDENTS 




u> o ii) o in 

O r- r- CM fVi 



8KO «~ lO r~ VO 
O *^ »~ CM ^ *** 



I 



I I I I I I I 

^ VX> »— *0 r— 

crt ir> to f«s r*. CO 



o o 



GPA- ERRORS 



ERIC 



24 



Table 9. Observed and Predicted CPAs for 28 Majors 
(Maturity Level A) 



MAJOR 



NUMBER 



TITLE 



OBSERVED 



PREDICTED 



DIFFERENCE SCORES 



MEAN 



INDIVIDUAL 



41 
A 
23 
17 
7 
44 
16 
54 
15 
32 
3 
4 
13 
61 
10 
73 
8 
57 
2 
71 
20 
65 
3 
9 
23 
17 
5 
8 



751 



Aero. Engrg. 
Amer. Studies 
Astronautics 
Basic Sciences 
Chemistry 
Civil Engrg. 
Computer Sciences 
Economics 
Elec. Engrg. 
Engrg. Mgt. 
Engrg. Sciences 
Far East Studies 
General Engrg. 
General Studies 
Geography 
History 
Humanities 
Intl. Affairs 
Latin Am. Studies 
Life Sciences 
Mathematics 
Mechanics 
Military Science 
Political Science 
Physics 
Psychology 
Soviet Studies 
W. Europ. Studies 



3.19 

A 

3.30 
2.91 
3.10 
3.09 
2.91 
3.21 
3.23 
2.61 
2.77 
2.71 
2.50 
2.45 
2.97 
3.21 
3.00 
3.06 
3.27 
3.38 
3.37 
3.24 
2.41 
2.49 
3.20 
2.98 
2.86 
3.04 



OVERALL AVERAGE 



2.98 



3.17 

A 

3.37 
2.59 
3.11 
3.10 
2.88 
3.09 
3.27 
2.75 
2.69 
2.65 
2.72 
2.42 
2.90 
3.14 
2.94 
3.00 
3.24 
3.28 
3.38 
3.30 
2.74 
2.88 
3 17 
3.01 
2.82 
2.77 



2.98 



.02 
A 
.07 
.32 
.01 
.01 
.03 
.12 
.04 
.14 
.08 
.06 
.22 
.03 
.07 
.07 
.06 
.06 
.03 
.10 
.01 
.06 
.33 
.39 
.03 
.03 
.04 
.27 



.10 



.24 
A 
.22 
.42 
.24 
.34 
.23 
.33 
.27 
.26 
.07 
.18 
.38 
.24 
.43 
.29 
.34 
.24 
.07 
.25 
.29 
.32 
.45 
.38 
.25 
.21 
.21 
.27 



.27 



ADJUSTED OVERALL AVERAGE 



3.04 



3.03 



.08 



.29 



Keypunch errors resulted in the loss of data for three students, one each from 
the majors. Aeronautical Engineering, Electrical Engineering, and Life 
Sciences. Time did not permit correction of these errors. However, these 
data would not likely provide a perceptible change in the results. 
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denoted by circles encompassing the major 
numbers and were arbitrarily selected on the basis 
of being less than 10. Although such small samples 
would ordinarily be expected to provide consid- 
erable instability, their elimination from the pool 
of data did not alter the overall averages signifi- 
cantly, as can be seen in the row entitled 
"Adjusted Overall Average" in Table 9. 

Results of Maturity Level B Validation 

Maturity Level B forecasting accuracy is 
summarized in Table 10. The central tendency 
calibration was offset by only 0.03 of a GPA and 
the individual differences across all students aver- 
aged 0.32 of a GPA. These results correspond 
rather closely with those obtained for maturity 
Level A, there being a sli^t decrease in intrinsic 
accuracy and central tendency calibration. This 
outcome is quite satisfactory, considering that 
Level B sacriflces an entire year of information on 
a student. 

Table 10. Observed and Pfedicted 
CPAs for 754 Students 
(Maturity Level B) 



Student 
1 



7 54 

Overall 
Average 



GPA 
Observed 

2.48 



3.00 
3.04 



GPA 
Predicted 

2.71 



3.33 
3.01 



Individual 
Difference 

!23 



.33 
.32 



Table 11 shows the results obtained for the 
Level B forecast analysis by majors. These results 
compare favorably with those of Level A, the most 
important difference being a decrease in intrinsic 
accuracy of 0.06, on the average; ie., 0.27 versus 
0.33. 

Results of Maturity Level C Validation 

Table 12 shows the maturity Level C analysis of 
observed and predicted GPAs. Surprisingly, the 
central tendency calibration remained hi^; the 
offset was only 0.02 of a GPA. However, intrinsic 
accuracy continued to decrease as expected, the 
average difference score being 0.38. Since the 
Level C Model uses no information whatsoever 
about a student while attending school, such 
accuracy must be viewed as highly satisfactory. 



Table 12. Observed and Predicted 
GPAs for 754 Students 
(Maturity Level C) 



Student 
1 



754 

Overall 
Average 



GPA 
Observed 
2.48 



3.00 



GPA Individual 
Predicted Difference 
2.89 .41 



3.04 



3.40 



3.06 



40 



.38 
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Level C forecast analysis by majors is given in 
Table 13. Again, the central tendency calibration 
was excellent since the offset was only 0.02 of a 
GPA. And, as expected, intrinsic accuracy con- 
tinued to decrease; the overall average deviation 
was 0.37 of a GPA, 0.10 greater than Level A and 
0.04 greater than Level B deviations. 

Conclusions 

In general, the validation results were quite 
positive in nature. Central tendency calibrations 
were consistently high and intrinsic accuracy was 
excellent, on the average. The former measures 
lead us to conclude that the CACS prediction 
model is remarkably free of biases, which would 
cause it to consistently underestimate or over- 
estimate student performance. With regard to 
intrinsic prediction errors; /.e, those reflected in 
i n d i vi d u al differences betwee n observed and 
predicted GPAs, no probabilistic model can hope 
to eliminate such errors completely. The average 
individual errors of 0.27, 0.33, and 0.37 for Levels 
A,B, and C, respectively, appear to be quite tol- 
erable and should not negate the practical value 
that CACS can provide the school counselors. 
However, the individual prediction errors in CACS 
can undoubtedly be reduced through additional 
analysis. The 133 students who demonstrated the 
largest deviations from equation forecasts provide 
a primary research focal point for such analysis. 



IV. DISCUSSION AND RECOMMENDATIONS 
Overview 

The System Development Corporation assem- 
bled and provided an initial computer system for 
the academic counseling. 



26 



Table 11. Observed and Predicted CPAs for 28 Majors 

(Maturity Level B) 



N 


MAJOR 


OBSERVED 


PREHICTED 


DIFFERENCE SCORES 






MEAN 


Individual 


NUM3ER 


TITLE 


41 


15 


Aero. Engrg. 


3.19 


3. 12 


.07 


.30 




20 


Amer. Studies 








t 


23 


07 


Astronautics 


3.30 


3.43 


.13 


.29 


17 


02 


Basic Sciences 


2.91 


2.62 


.29 


.44 


7 




Chemistry 


3.10 


3.09 


.01 


.46 


44 


10 


Civil Engrg. 


3 .09 


3.05 


.04 


.43 


16 


23 


Computer Sciences 


' 2.91 


2.85 


.06 


.33 


54 


17 


Economics 


3.21 


3. 15 


.06 


.37 


15 


11 


Elec. Engrg. 


3.23 


3.30 


.07 


.32 


82 




Engrg. Mgt. 


2.61 


2.78 


.17 


.30 


3 




Engrg. Sciences 


2.77 


2.52 


.25 


.27 


4 




Far East Studies 


2.71 


2.68 


.03 


.12 


13 




General Engrg. 


2.50 


2.61 


.11 


.27 


61 


21 


General Studies 


2.45 


2.42 


.03 


.24 


10 


19 


Geography 


2.97 


2.67 


.30 


.47 


73 




History 


3.21 


3.14 


.07 


.32 


8 




Humanities 


3.00 


2.98 


.02 


.39 


57 




Intl. Affairs 


3.06 


3.01 


.05 


.27 


2 


@ 


Latin Am. Studies 


3.27 


2.95 


.32 


.31 


71 


24 


Life Sciences 


3.38 


3. 32 


.06 


.26 


20 


06 


Mathematics 


3.37 


3.35 


.02 


.32 


65 




Mechanics 


3.2h 


3.21 


.03 


.36 


3 




Military Sciences 


2.41 


2.72 


.31 


.46 


9 




Political Gcience 


2.49 


2.89 


.40 


.40 


23 




Physics 


3.20 


3.19 


.01 


.31 


17 




Psychology 


2.98 


3.00 


.02 


.23 


5 




Soviet Studies 


2.86 


2.75 


.11 


.27 


8 




W. Europ. Studies 


3.04 


2.77 


.27 


.32 


751 


OVERALL AVERAGE 


2.98 


2.95 


.12 


.33 




ADJUSTED OVERALL AVERAGE 


3.04 


3.01 


.09 


.32 
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Table 13. Observed and Predicted CPAs for 28 Majors 

(Maturity Level C) 







MAJOR 






DIFFERENCE 


SCORES 








OBSERVED 


PREDICTED 


















NUMBER 


TITLE 






MEAN INDIVIDUAL 


41 


15 


Aero. Encrc. 


3. IS 


3.11 


.08 


.39 


A 


20 


Ainer • Studies 


A 


A 


A 


A 


23 


07 


As tronautics 


3. 30 


3.39 


.09 


.39 


17 


02 


Bas ic Sciences 


2.91 


2.58 


.33 


.49 


7 




Chemistry 


3.10 


3.17 


.07 


.47 


44 


10 


Civil Engrg. 


3.09 


3.07 


.02 


.46 


16 


23 


Computer Sciences 


2.91 


2.86 


.05 


.34 


54 


17 


Economics 


3.21 


3.03 


.18 


.43 


15 


11 


Elec, Engrg. 


3.23 


3.18 


. 


. JO 


82 




Engrg. Mgt, 


2.61 


2.98 


.37 


.43 


3 




Engrg. Sciences 


2.77 


2.73 


.04 


.28 


4 




Far East Studies 


2.71 


3.08 


.37 


.36 


13 




General Engrg, 


2.50 


2.61 


.11 


.34 


61 


21 


General Studies 


2. 45 


2.53 


.08 


.27 


10 


19 


Geography 


2.97 


2.80 


.17 


.42 


73 


08 


History 


3.21 


3.16 


.05 


.41 


8 


® 


Humanities 


3.00 


3.02 


.02 


.36 


57 


OA 


Intl. Affairs 


3.06 


3.09 


.03 


.35 


2 


(p) 


Latin Am. Studies 


3.27 


3.03 


.24 


.23 


71 




Life Sciences 


3.38 


3.28 


.10 


.36 


20 


06 


Mathematics 


3.37 


3.37 


.00 


.28 


65 




Mechanics 


3.24 


3.27 


.03 


.42 


3 




Military Sciences 


2.41 


2.82 


.41 


.44 


9 




Political Science 


2.49 


2.86 


.37 


.44 


23 




Physics 


3.20 


3.24 


.04 


.39 


17 




Psychology 


2.98 


3.06 


.08 


.30 


5 




Soviet Studies 


2.86 


2.83 


.03 


.31 


8 




Wt European Studies 


3.04 


2.90 


.14 


.33 


751 


OVERALL AVERAGE 


2.98 


3.00 


.13 


.37 
















ADJUSTZD OVERALL AVERAGE 


3.04 


3.03 


.10 


.38 



Keypunch errors resulted in the loss of data for three students, one each from 
the majors, Aeronautical Engineering, Electrical Engineering, and Life 
Sciences. Time did not permit correction of these errors. However, these 
data would not likely provide a perceptible change in the results. 
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This development used the existing data avail- 
able on students at the school and existing sta"^ 
of-the-art capability in' predictive modeling and 
interactive computer systems to construct a 
useable first model that would enhance counseling 
operations as well as provide capability for self- 
growth and maturity within the total realm of 
career counseling. 

Objective s 

The objectives of the initial Computerized 
Academic Counseling System (CACS) were to 
predict the likelihood of success for a student in 
any of the major fields of study, and to produce a 
predicted point estimate of grade point average 
(GPA) for each major and a standard error of 
estimate value that could be attached to each 
prediction. This information, along with student 
supplementary data, was to be made available to 
an acaderric counselor in a responsive manner that 
would enhance his ability to conduct effective 
counseling. 

In general, the objectives of the contracted 
projects were satisfactorily met, considering the 
magnitude and quality of data available. An initial 
mathematical model composed of 84 prediction 
equations divided into three levels of student 
maturity was developed and assessed for its 
academic performance forecasting potential. Also, 
31 potential predictor variables available on 
academic tape files were investigated and con- 
verted into weighted equation parameters appro- 
priate to their degree of unique relationship to 
advanced academic performance. 

The amount of performance variation ac- 
counted for varied considerably between the three 
student maturity levels, being 49 percent for the 
most advanced student Level A, 38 percent for 
student Level B, and 19 percent for student Level 
C. However, all of these levels were considered to 
possess sufficient statistical forecasting efficiency 
to allow their use by an academic counselor. 

The statistical efficiencies of the forecasting 
models also varied considerably from major to 
major. Some of this variation was due to the small 
sample sizes available, for some majors did not 
allow sufficient stabilization of equations to be 
achieved. In other cases, the unique nature of the 
major appeared to be involved; e.^., the Computer 
Sciences forecasting equations had the poorest 
statistical forecasting efficiency in all levels. 



Design Rationale 

The five basic design criteria of long life expect- 
ancy, ease of expansion, ease of updating, sim- 
plicity of use, and timely response were met quite 
well, considering die relative complexity of the 
various components that went into development 
of the system and the accelerated production 
schedule that had to be maintained in order to 
achieve the installation date. The major design 
Hmitatiops that cm be improved upon are as 
follows; 

1. Increasing the sample sizes used in equation 
derivation, especially for those majo.s that tend to 
have fewer graduates. 

2. Adding additional pertinent variables for 
each student that reflect such factors as moti- 
vation, vocational interest, personality, and 
specific area aptitude/achievement indicators. 

3. Augmenting the academic success fore- 
casting model with a parallel model based on 
group membership probability. 

4. Creating additional models that will tie the 
academic performance at the school with industry 
manning and replacement factor:. 

5. Modifying the hardware/software aspects of 
the system and its supporting elements to effi- 
ciently route a greater abundance of data to 
counselors as well as modeling analysts. 

System Expans on 

There are three principal areas of potential 
::ystem expansion. One is in the direction of 
enlarging the scope o-" academic counseling 
through use of additional mathematical models, 
hardware, and software to encompass the larger 
sphere of career counseling within the manning 
requirements of industry. 

A second area concerns the improvement of the 
efficiency of forecasting performance at the school 
by developing additional models while existing 
models are improved through augmented data 
collection and analysis. 

A third area involves the computer efficiency of 
the hardware and software systems both within 
CACS and the environment in which it can best be 
expected to operate. New display devices, capa- 
bilities and interactive computer system concepts 
which are constantly being developed should be 
reflected in the CACS system expansion. 
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Recommendations for Future Research 

In the scientific literature on personnel classifi- 
cation and counseling; e.g., (Bennett, Seashore, & 
Wesman, 1952; Cattell, 1949;DuMas, 1949; Dunn, 
1955; Rulon, 1967; Stewart, 1947; Strong 1931; 
Tatsuoka, 1956), supplementary mathematical 
models are discussed that serve to complement the 
typical success forecasting model. The most prom- 
inent of these has been referred to as the group 
membership modeling approach, designed to 
determine the extent to which a person possesses 
those particular characteristics that are associated 
with typical members of a given occupational 
group. The forerunner of this approach was the 
excellent work performed by Strong (1931) with 
respect to vocational interests. 

Group membership modeling theory argues that 
it is not meaningful to estimate the likelihood of 
success in a given occupational area unless the 
person can legitimately be considered a compe- 
titive member of that occupation; i.e., to have a 
profile of abilities, interests and personality char- 
acteristics that fit well with the current members 
of the occupation. 

Frequently, a specific profile characteristic may 
be very pertinent to a given field of endeavor yet 
show no statistical relationship to performance in 
that field. In her study of Brown University 
majors, Dunn (1955) found that her equations for 
predicting grade point averages of chemistry 
majors gave no weight to mathematics ability 
while her equations for history majors gave no 
weight to verbal ability. She concluded that indivi- 
duals within these majors had self-selected them- 
selves and; consequently, became highly homo- 
geneous on each characteristic. As a result, these 
two types of abilities did not differentiate students 
within these majois. Apparently the variance 
phenomenon, called restriction of range of talent, 
operated on the statistical correlation between 
mathematical/verbal factors and chemistry /history 
grade performance, in the Dunn studies, in such a 
manner as to attenuate the impacts in the equation 
model. Dunn also found that mathematic:; ability 
picked up a weight for hiUory majors. This 
demonstrates how secondary predictors can 
assume more weight in an equation than primary 
factors, due to the fact that the correlational 
influence of the primary predictors has been 
homogenized out by selective factors, while the 
influence of the secondary predictors has not. 

For advanced modeling purposes applicable to 
this Computerized Academic Counseling System, 
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as well as, the general state-of-the-art in comput- 
erized occupational guidance, the following 
considerations are proposed. 

1. Augment the 31 present predictor variables 
in CACS with suitable vocational interest and 
personality measures for each student. The 
measures chosen should have the quality asso- 
ciated with recognized instruments like the Strong 
Vocational Interest Blank and the Minnesota 
Multi-phasic Personality Inventory. 

2. Conduct a thorough factor analysis of all 
predictors so as to reduce the effective statistical 
dimensionality of the independent variable 
doi.iain. 

3. Define major occupational groups into 
which all school attendees can be grouped. These 
groups should be sensitive to the existing vocations 
in industry and be tied to given majors or clusters 
of majors. The groups may include past graduates 
as well as undergraduates Wi.o are essentially 
committed to a given field of endeavor, such as 
engineering. 

4. Combine all students into one large sample 
for further analysis. Each student will have a series 
of dummy dependent variables showing zeros 
except for the group to which he belongs. 

5. Conduct systematic multiple regression/ 
discriminant analyses using the group membership 
dummy variables with respect to the major 
occupational groups as dependent variables and 
the factors isolated in item 2 as independent 
variables. Caution: Before an independent variable 
is considered for inclusion in this model, it must 
contain data values for all students or at least for 
most all students; otherwise, non-Gramian statis- 
tical effectj will result giving misleading group 
membership matching equations. 

6. Obtain group membership matching 
equations with the weights assigned to each factor 
so as to maximize the Komogenity of each group 
as well as its distinction from all other groups. 

7. Convert weights from factors back into 
te;ms of the original measures used as independent 
variables. This will provide a more visible method 
by which the specific effects of each variable in 
the basic data array can be assessed. 

8. Create an algorithm for converting equation 
results to a probabiHty statement that a given 
student possesses the attributes befitting each 
major occupational group in tlie complementary 
mode^ 

This algorithm can be obtained in the manner 
described in the Success Probability Section of the 
CACS Mathematical Mpdel Description in the 
appendbc. 
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APPENDIX: MATHEMATICAL MODEL DESCRIPTION AND MAINTENANCE 



Introduction 

This report constitues one of a series of documents describing the design, development and evaluation of 
a Computerized Academic Counseling System (CACS). The present document describes the technical 
development of a mathematical model which represents the core of CACS. The purpose of the model is to 
predict, during the first two years of academic work, eventual grade point average (CPA) and likelihood of 
achieveing at lea^t a 2.0 CPA for each student in any of 28 major fields of studv. The model is primarily 
composed of a set of multiple linear regression equations which utilize input data routinely collected. 

A discussion of the mathematical model, as it relates to other components of the CACS system, is 
presented ir» the official R&D contract file. 

The Mathematical Model for Grade Point Estimation 

The maihematical model for predicting academic success is designed to generate grade point estimates for 
a student at three different levels, depending on how far he has advanced in his school work. Level A is 
designed to forecast success for relatively advanced students. Level B for modeiately advanced students, 
and Level C for relatively new students. All level? ZiC ibcused on students in their first two years with the 
dependent variable being the major CPA likely to be achieved during the last two years. Separate 
mathematical equations are provided within each level for 28 majors that were active during the time period 
1967 through 1971. The mathematical form used to generate the point estimate within each level is as 
follows: B X + C = Y 

where B =bi, t bj, 2 bi,3i is a 31 by 28 matiix of Hnear regression coefficients. This 

matrix will contain null coefficients for those predictor 
variables th^;t are not currently used in making a point 
estimate for a given major. See the System Specification 
(official R & O contract file) for the values of the 
^2 8, 1 ^2 8, 2 • ■ • -^2 8, 3 1 Coefficients currently stored in the program. 

is a vector of 31 numerical predictor values for the student 
being counselled. Similar vectors are compOed from the 
CAIDS-PRC data base for active students and are made 
available to the model on call. All 31 predictor values are 
available for counselor use, although only a portion of these 
are actually used in making the grade point estimates. 



C = c 1 is a vector of 28 regression constants uniquely applicable to 

c 2 each major. These constants are part of the regression 

equations (the y-intercepts) computed during statistical 
c 28 analysis. 

Y = y 1 is a resul .ant vector of 28 grade point estimates, one for 

y 2 each major for the student being processed. If only one 

major of interest is specified by the counselor, the 27 
y 28 remaining estimates are bypassed by the program. 



The B matrix filed in the CACS Prediction Module contains provisions for 31 prediction equation 
coefficients for each major. This matrix table is flexible in that any one of the 31 variables forwarded froin 
the Data Retrieval Module can be weighted. Thus, there is considerable room for equation expansion 
without the necessity to overhaul the Prediction Module. In the current mathematical model, however, only 
16 of the 31 independent variables are actually weighted in the major GPA prediction equations. The 
remaining 15 variables were noiiweighted because of the failure to survive the variable selection criteria 



X = 



X 1 

X 2 



X 31 
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described in Steps 10 through 16 of the Updating Procedures in the discussion paragraph, page 22 (e.g., lack 
of correlation with major GPA, redundancy with a statistically superior variable). 

After an updated statistical analysis is conducted, regression coefficients for existing predictors can be 
easily changed without redesigning the program. 

Standard Error of Estimate 

Along with the major GPA performance point estimates provided by the elements of the Y-vector, the 
CACS model also provides standard errors of estimate to be used by the counselor in assessing the relative 
confidence that may be placed in the forecasts. The standard error values are computed at the same time as 
the regression equations and are stored in the program as an E-vector containing 28 elements to correspond 
with the Y-vector. Each E-element is defined by the following equation: 



E. = 
1 



S(Yoi-Yci)^ 
N-M-l 



where Ej is the standard error of estimate corresponding with the ith of the Y-vector. 

Yoi is the observed major GPA compiled for each student graduate in the data base used for 
equation derivation. 

Yci is the computed value of the major GPA when applying the equa:ion to each graduate in the 
data base from which it was derived. 



N is the number of student graduates in the data base used for equation derivation for the specific 
major. 



M is the number of predictors used in the equation to predict Yi. 



It is assumed that the computed standard error will remain comparable for new students. This 
assumption, however, is contingent on the size of the data base used, the number of predictors applied, and 
the consistency of relationships (correlations) among the predictors and the criterion; i.e., major GPA. A 
special study is being conducted to test the consistency of the standard error on new students. The results 
of this study may lead to an operational correction in the computed standard error in order to make it 
more realistic in actual counseling. 

Success Probability 

Along with each point estimate and accompanying standard error of estimate, the CACS model involves 
several mathematical transformations to arrive at a determination of success probability on a scale from .01 
to .99. 

The approach to the problem of estimating the likelihood of success for a student in a given major was 
defined on the basis of the standard statistical forecasting theory, which is summarized as follows. The 
estimated major GPA and the standard error of estimate are taken as the parameters (mean and standard 
deviation) defining the most likely distribution of prediction errors for each forecast. The minimum 
academic requirement for acceptable major GPA is set into the context of this distribution, and 
standardized in terms of the normal unit curve. This can be illustrated as follows: 
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-3 '2 -1 >k +1 +2 +3 

Std Std Std Equation Std Std Std 

Error Error Error Estimate Error Error Error 



The portion of the standard normal curve to the right of the minimum academic requirement represents 
the probability that the actual major GPA will fall into the acceptable region. The size of this region relative 
to the total curve becomes the probability of success. In the illustrated example, the minimum academic 
requiren^ent falls at one standard error of estimate below the estimated major GPA, therefore, the region of 
success represents 84 percent of the total curve. The region of failure, i.e., the likelihood that the actual 
GPA will be below minimum requirements represents 16 percent of the total curve. Thus, the probability of 
success is set at .84. 

The computational algorithm for determining this probahiHty is as follows: 

trial probability = 1-% (l+Ci | Z | +C2 | Z | ^+€3 | Z ^+€4 | Z | ^ ) 

where = .196854 

C2 = .115194 
C3 = .000344 
= .019527 

If 0 > Z> -2.17, probability = 1.00 minus trial probability. 

If 0 </L <2.ll, probability = trial probability. 

If Z> 2.17, probability = .99 and trial piobability is not computed. 

If Z< -2.17, probability = .01 and trial probability is not computed. 

A standard ratio defining the deviation of the point estimate from minimum accepatable performance 
is given by: 

Y. -2.0 (Limits of 04 are set for 

Z. = I; in this calculation.) 

where Yj = the point estimate of major GPA obtained as an element in the Y-vector previously described. 

2.0 = a constant describing the minimum acceptable major GPA at the school. 

Ej = the standard error of estimate corresponding to the point estimate of the ith element in Y. 

The success probability provides an estimate of the likelihood that the rtudent being counselled will be 
abJo to achieve at least a minimum grade point average of 2.0 in the specified major. 
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Data Base 



The data base for the update is derived from the CAIDS-PRC data files for past graduates of the school 
starting with the class of 1967. The data set for each graduate consists of: 

1. Student Control Number CAIDS Item MXC2, tape positions 863-868 



2. Major Code 

3. Potential Predictor Variables 

4. Major GPA (Variable 32) 
Potential Predictor Variables 



Variable 
Number 

1 



Source 
PRC 



PRC 



PRC 



PRC 



PRC 



CAIDS Item MXMl, tape positions 427-428 

As described in the potential predictor variable.* paragraph 

As described in the major GPA (variable 32) paragraph. 



Statistical 
Limts 

1 .004.00 



1.004.00 



1.00-4.00 



1.004.00 



1.00-4.00 



Description 

Grade point average (GPA in 
Mathematics during fourth and third 
class years. Calculate the GPA for all 
Mathematics courses during fourth ar.d 
third years. This consists of all courses 
v^th a Department Code Letter of "P" 
in column 51 on "G" and "A" records. 

GPA in other Basic Sciences during 
fourth and third class years. These 
courses are those with a Department 
Code letter of"E","r',or "S". 

GPA in Engineering Sciences during 
fourth and third class years. These 
courses are identified with Department 
Code Letters "B'', "C", "H", and 
"R". 

GPA in Humanities during fourth and 
third class years. These courses are 
identified with Department Code Letters 
"J", "K", and "M". 

GPA in Social Sciences during fourth and 
third class years. These courses are iden- 
tified with Department Code Letters "G", 
"L", "U'\ "N", and"D". 

GPA in Mathematics during fourth class 
year. (Otherwise same as Variable 1.) 

GPA in other Basic Sciences during 
fourth class year. (Otherwise same as 
Variable 2.) 
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r 



Variable 
Number 

8 
9 

10 



11 CAIDS O-I 

12 CAIDS 0-2 

13 CAIDS 20-25 

14 CAIDS 400-800 

15 CAIDS 400-800 

16 CAIDS 400-800 

17 CAIDS 900-1600 

18 CAIDS 500-800 

19 CAIDS 1-2 
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Statistical 
Source Limits 



Description 

GPA in Engineering Sciences during 
fourth class year. (Otherwise same as 
Variable 3.) 

GPA in Humanities during foui Ji class 
year. (Otherwise same as Variable 4.) 

GPA in Social Sciences during fourth 
class year. (Otherwise same as Variable 
5.) 

Falcon or Skelly Scholarship. Item MXFS 
(Pos. 18). If item contains an "F" or 
**S", this variable will have a value of"]"; 
**0"for all other students. 

Turnback Indicator. If item MXD3 
(Pos. 72, 73) contains a yea:, this 
variable will have a value of 

If item MXD? (Pos. 70,71) contains 
a year and item MXD3 is blank, this 
variable will have a value of **1 ". 

If both above items are blank, this 
variable will have a value of "0". 

Age at Graduation . Pos. 442,443 
of item MXT2 minus Pos. 93, 94 
of item MXDB. 

Prior Academic Achievement. Item MXPR 
(Pos. 797-799). 

Verbal Apritude. Item MXVA (Pos. 
800-802). 

English Composition. Item MXEN 
(Pos. 803-805). 

Composite English Score. Item 
MXEC (Pos. 806-809). 

Math Aptitude. Item MXMA (Pos. 
810-812). 

Intermediate or Advanced Math Code. 
Item MX?A (Pos. 813). If coded "A", 
convert to "2"; if "I", convert to "1". 
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Variable 
Number 

20 



21 

22 
23 
24 
25 
26 
27 



28 



29 



30 



Source 
CAIDS 

CAIDS 

CAIDS 



CAIDS 



CAIDS 



CAIDS 



Statistical 
Limits 

500-800 



1000-1600 



2600-4000 



CAIDS 400-800 



CAIDS 300-800 



CAIDS 300-800 



CAIDS 1200-2400 



CAIDS 500-800 



14 



0-1 



0-1 



Description 

Math Achievement. Item MXMV 
(Pos. 814-816). 

Composite Math Score. Item MXMC 
(Pos. 817-820). 

Academic Composite. Item MXAC 
(Pos. 821-824). 

PAE Score. Item MXPA (Pos. 
825-827). 

Activities - Athletic. Item MXAA 
(Pos. 828-830). 

Activities - Nonathletic. Item MXAN 
(Pos. 831-833). 

Leadership Composite. Item MXLD 
(Pos. 834-837). 

Weighted Composite. Item MXWC 
(Pos. 838-841). Delete the least 
significant digit shown in the student 
Master Tape. Then all variables 
obtained from this tape will be scaled 
as integers. 

Medical Qualification Code. Item 
MXMD (Pos. 842). If this column 
contains a letter, convert to a 
number as follows: 



A= 1 
B = 2 
C = 3 
D = 4 
E = 5 
F = 6 



S= 1 
T = 2 
U=3 
V=4 
W = 5 
X=6 



All other letters = 7 

Military Prep School Attended Code. 
Item MXPP (Pos. 843). If item contains 
an "A", convsrt to "1"; all others convert 
to "0". 

Other Prep School Attended Code. Item 
MXPP (Pos. 843). If item contains a 
"P", convert to "1"; all others convert 
to "0". 
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Variable 
Number 



Source 



Statistical 
Limits 



Description 



31 



CAIDS 



0-1 



College Attended Code. Item 
MXCO (Pos. 844). If item contains 
a "C", convert to "1"; all others 
convert to "0". 



Major CPA (Variable 32) 

The procedure is designed to obtain major CPA during the second and first class years. The Academic 
curriculum handbook for 1971 throu^ 1972 v^^asused as a guideline in developing this procedure. 

The student Master File is merged with the PRC File to create a card image data file for each year 
starting with 1967. The merged file is arranged in ascending order by student identification number for 
both graduates and nongraduates. Then, the 3 1 potential predictor variables and major CPA aie calculated 
for all students except those that have no: (a) CAIDS data, (b) PRC data, (c) graduation indication (Item 
MXT2, Pos. 437- 443), and (d) major indication (Item MXMl , Pos. 427- 428). 

In calculating the major CPA, the "Y" card is used to determine the second and first class years. The 
"G" and "A" cards follov^^ing the "Y'- card provide the courses, grades, and hours credited the student 
during the second and first class years. 

The information obtained from the "G" and "A" card is as follows: 

1. Col. 51 Department Code Letter, which identifies the course 

2. Col. 53 Letter Grade 

3. Col. 55-58 Credit Hours (3 decimal places) 

The Depsxtment Code Letter found in Col. 51 is compared to the course code determined for that 
major. Table 14 contains the 28 majors, the two-digit school Code, and the course code within each major. 
If the Department Code found on the "G" and "A" card is a Department Code defined for that major in 
Table 14, then the grade data for the course is included in the calculation for that major GPA, otherwise it 
is ignored. 



The computational formula is: 



where Y = Major GPA (Statistical Limits: 1.004.00) 
H = credit hours for the course 

G = converted letter grade (A=4, B=3 , 0=2, D=l , F=0; all other letter grades are ignored) 

(4)ddting Procedures 
The following steps describe the mathematical model updating procedures. 

Step Procedure 

1 The data set for each graduate is compiled from the CAIDS-PRC files for class years 1967 through 
19xx. 



Y = S HG/ S H 
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step 



Procedure 



2 All the above data sets are sorted and counted by graduation years within each major. 

3 The mean major GPA for each major for each year is computed. 

4 Any sharp change in mean GPA, that can be attributed to some factor other than sampling 
fluctuation, is studied for possible implications that could lead to the deletion of certain years 
from the subsequent calculations for a major. Drastic changes in grading standards or 
requirements for a major are among the potential causes for this type of action. Another could 
be the deactivation of a major, which has had graduates in the past. Still another could be 
insufficient daia for a major to permit confidence in the equations derived. 

There is no absolute rule for dictating what the minimum number of students should be before 
an equation solution for a major is attempted. SDC experience recommends a sample size of 
approximately 50 or more students as a reasonable working limit. Exceptions to this rule may 
have to be made to keep all majors in the model. 

5 After the student data sets are screened for inclusion/exclusion of graduating years and majors, 
they are grouped by majors for further testing concerning prep school attendance. Special SDC 
studies have revealed that the actual values for certain predictor variables used by CACS are 
inflated by prep school attendance. The variables isolated so far are CPA-Mathematics fourth 
class year, and CPA-Mathematics fourth and third class years. The grade bias in these variables 
is revealed when computing the fourth class year and the fourth plus third class years mean 
CPA for graduates who attended prep school and those who did not in the following subject 
matter fields: Basic Sciences (other than Math), Social Sciences, Engineering Sciences, 
Humanities, and Mathematics. 

The mean GPA in Mathematics for prep school attendees has been shown !0 exceed the 
composite mean CPA relative to the other four subject matter areas. A correction in the CACS 
program and an adjustment in the data base used for model construction is recommended in 
order to reduce the effect of this bias on the CACS predictions and future equations to be 
computed. The preferred procedure is to eliminate all prep school data sets from inclusion in 
future^ regression computations because they serve to distort the true effect of the mathematics 
variables and perhaps other variables on major GPA performance. 

\ 

If the deletion of prep school attendees works a hardship on the sample size for a major that 
does not produce many graduates, an alternative procedure may be invoked. Determine a 
correction in the mathematics grade scores used by CACS for predicting major GPA 
performance for students who have attended military prep school. This correction is computed 
as follows: 



4th Class 4th & 3rd 

Year Class Years 

Mean GPA Mean Gl'A 

Basic Sciences 2.48 2.46 

Social Sciences 2.51 2.56 

Engineering Sciences * 2.59 

Humanities 2£7 2M 

Mean CPA Non-Math 2.55 2.56 

Mean GPA Math 2.92 2.85 

Correction . ^ .37 - .29 
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step 



Procedure 



Thus, CACS would reduce the Math 4 GPA by .37 and the Math 4+3 GPA by .29 before using 
these variables in predicting the major GPA for a student who attended military prep school. 

For a student who attended a non-nlilitary prep school, a smaller adjustment would be made as 
follows: 





4th Class 


4th & 3rd 




Year 


Class Years 




Mean GPA 


Mean GPA 


Basic Sciences ' 


2.49 


2.54 


Social Sciences . 


2.60 


2.60 


Engineering Sciences 


* 


2.43 


Humanities 


2.70 


2.65 


Mean GPA Non-Math 


2.60 


2.56 


Mean GPA Math 


2.70 


2.66 


Correction 


- .10 


- .10 



The asterisk indicates that this major is not used in 4th class year correction due to lack of data 
for this variable on most students. 

Thus, CACS would reduce the Math 4 GPA by .10 and the Math 4 + 3 GPA by ,10 before using 
these variables in predicting the major GPA for a student who attended a non-military prep 
school. 

There would be no adjustment for previous college attendees or non-preps as they would be 
considered as normative students for the purpose of the model. 

As a result of including the prep school attendees in the initial CACS model derivations, the 
statistical computations used to derive the coefficients assigned to Math 4 + 3 GPA in Level A 
and to Math 4 GPA in Level B were somewhat contaminated. This contamination may have 
extended in lesser degree to other coefficients that were influenced by the inclusion of what 
was regarded as non-normative types of student graduates in the derivations. Since the initial 
model was statistically determined on the basis of a preponderance of acceptable normative 
graduates and numerous non-math predictor variables, it was not considered cost-effective to 
perform a recalculation of the initial model. 

However, when the model is updated with new graduates, any graduate who attended any type 
of prep school should be deleted from the statistical computations used to derive the equations. 
If this procedure works a hardship on the sample size for certain majors such that it dips 
considerably below 50, an alternative approach may be used. Here one would keep ihe prep 
school attendee in the statistical calculations for the given major but would reduce his Math 4 
and Math 4+3 GPAs by the appropriate corrections described earlier. 

It is possible that the predictor variable adjustments described above could disappear or become 
larger with succeeding classes of graduates. Therefore, the statistical correction procedures 
should always be applied during CACS mathematical model maintenance. 

For the next step, it will be assumed that all inappropriate graduated student data sets will 
either have been eliminated from the data base or suitably corrected as described above. 
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Obtain data listings by majors for each variable in tiie data set. These listings are screened to 
assure that majors are not mixed and that reasonable data values are avaOable for each variable. 
The statistical limits shown in the previous section can be used as guidelines for screening data 
values. 

Using a conventional product moment correlation analysis program, compute for each major, a 
32 by 32, variable pair by variable pair, symmetric correlation matrix (R) where each element 
rij is defined as follows: 

rij = 



ERIC 



where i is the row subscript (1 < i < 32) 

j is the syn-'rn'^tric colamn subscript (1 <j < 32) 

Ny is the number of paired values for the element being computed 

It is desirable in thi above calculation that the subscript 32 be assigned to the dependent 
variable for the model, which is major GPA. This will make it easier, along with the symmetric 
format, to observe the reiipective correlations of each independent variable v/ith major GPA and 
will expedite the screening of independent variables as potential predictors. When the N:: foj 
computing an element in the R matrix falls considerably below 50, that element and the data 
variable responsible for this condition should be flagged for caution and possible exclusion from 
final equation derivation. 

8 Using the 28 correlation matrices computed in Step 7 (there is a different one for each major) 
and a standard linear regression analysis program, compute for each independent (predictor) 
variable within each major and level: 

a. The standardized partial regression coefficient (beta coefficient) with respect to the 
dependent variable, major GPA 

b. the correlation coefficient with the dependent variable 

c. the product of the beta and correlation coefficients 

d. the correlation coefficient, with every c ther nominated independent variable 

Also for each equation, compute the mean, standard deviation, and unstandardized 
B-coefftcient for each independent variable; the regression constant (V-intercept), the 
coefficient of determination, the multiple correlation coefficient, and the standard error of 
estimate; and the mean, standard deviation and sample size for the dependent variable (major 
GPA). 

In the first pass through the regression analysis and trial equations that will result, nominate as 
independent variables only those predictors currently having non-null coefficients in the 
B-matrix for each level. If all 28 majors, that were active during the 1967 through i91\ period, 
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are preserved in the analysis, there will be a total of 3 levels times 28 equations within each 
level or a total of 84 regression analyses in the first pass of the updating process. 

9 Examine the outputs of each regression analysis to determine whether any current predictor 
variable should be eliminated from subsequent passes through the analysis. The criteria for 
eliminating variables will be described in Steps 10 through 14. 

10 First make a correlation sign check to determine if the direction of the correlation of each 
nominated independent variable is in the proper direction. All correlation signs with respect to 
major GPA should be positive except for the following variables, which have a logical negative 
relationship with the dependent variable. 



a. 


estim ited age at graduation 


b. 


turnback indicator 


c. 


medical qualification code 


d. 


Falcon/Skelly scholarship fund 


e. 


military prep school attendance indicator 


f. 


other prep school attendance indicator 



Note • The variables listed in b through f were not used as predictors in the initial model. 

1 1 Next, make a correlation magnitude test. To exceed chance occurrence probability, the 
magnitude should be greater than 2 divided by the square root of N^j, where N^j is the number 
of i^-'ired data values used to compute the correlation coefficient in question. The correlation 
magnitude criterion may be raised still higher at the discretion of the analyst in the event that 
there are numerous nominated independent variables that pass the minimum magnitude test. 

1 2 For all independent variables that pass the correlation -sign and magnitude test, apply the 
standardized partial regression coefficient (beta coefficient) sign check. The sign of the beta 
coefficient should correspond with the sign of the correlation coefficient. If it does not, there 
are probably overlapping predictor variables in the regression analysis. One or more variables 
should be eliminated to remove the overlap. The choice of the variable(s) to be removed in this 
case depends on extra-statistical considerations. 

13 After the bete sign check, examine ihe standardized partial regression coefficients for 
magnitude. Ideally, the magnitude of th^ beta coefficient should approximate the magnitude of 
the correlation coefficient, if all predicto. s were truly independent of each other. However, due 
to partially overlapping variables, the magnitude of a beta coefficient can faU considerably 
below the correlation coefficient even at times, assuming an illogical sign, when the overlap is 
high relative to the correlation of the competing independent variables with the dependent 
variable. 

When a relatively large number of independent variables is nominated and the N.. on which the 
correladons for these variables is based falls considerably below 50, the beta coefficients may 
assume distorted magnitudes that are considerably higher than those of the correlation 
coefficients. This condition often disappears as the number of independent variables is reduced 
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or the sample size is increased. No absolute rule exists for matching the appropriate number of 
independent variables to sample size. However, SDC experience recommends that no more than 
one predictor be employed for each ten student graduate data sets used to derive an equation. 

For the beta magnitude test, a useful approximation is the one described in Step 10 for the 
correlation magnitude; namely, 2 div'r^ed by the square root of Ny, where Ny is the number of 
paired values used to compute the correlation coefficient from which the beta coefficient is 
derived. If the independent variable has a beta coefficient that is less than this magnitude, it 
probably will have little impact on the dependent variable to be worth including in the 
equation. Sometimes; however, variables like this are left in because they may have considerable 
face validity for an equation. If the variable has a sizeable correlation coefficient and a 
moderately low beta coefficient, it is desirable to keep it in the equation, since it will have a 
combined impact that is not trivial as will be seen in the next step. 

14 For variables that survive the correlation/beta coefficient sign and magnitude tests, apply the 
correlation x beta coefficient product test. As implied in the previous steps, the sign of the 
product sliould always be positive, which indicates a consistency of signs for both the 
correlation coefficient and the beta coefficient. No absolute rule exists concerning the 
magnitude of the product to be desired. However, SDC experience recommends a minimum 
value of .01 in the product of the iwo coefficients. Such a value corresponds to a 1 percent 
impact of the predictor variable in the statistical determination of the dependent variable, 
namely, major GPA performance. 

15 After the model equations are purged of any undesirable predictors in Steps 10 through 14, the 
CACS model analyst may introduce and test previously unused predictor variables into the 
competition for space in the regression model. Ideally, variables should be introduced into an 
existing equation one at a time because the resulting effects on the regression coefficients are 
easier to observe. The regression coefficient for a predictor variable is mathematically sensitive 
to what other piedictor vari?*3les are included in the trial equation solution. If many new 
variables are introduced simultaneously, it becomes difficult to isolate the particular effects 
produced by specific variables. 

16 For economic as well as scientific parsimony considerations, new variables should be carefully 
screened before attempting to introduce them into a regression equation. There are four basic 
criteria used for such screening. These are: 

a. Appropriateness of Variable, is the variable an appropriate predictor for the lev^^I of the 
mode' being analyzed? For example. Mathematics GPA accrued during the fourth class 
year is an appropriate predictor for a Level B equation, but not for a Level A or Level C 
equation. For Level A, Mathematics GPA accrued during the fourth and third class years 
is undoubtedly a superior predictor. For .Level C, neither predictor is suitable, although 
they can be compiled for all graduates, because they will not be available for a new 
student during his first semester when the Level C equaUons must be applied. 

b. Data Quality. Is data available and usuable as a predictor for majority of the students? Is 
the data rehable and based on a proven measurement device and procedure? Is the data 
quantitative rather than qualitative in nature? Is there confidence in what the data 
purport to measure; /.e., would different analysts ascribe similar significance to high, 
medium, and low data values? 

c. Correlation with Performance. Does the variable correlate beyond chance expectation 
witli major GPA? Does it correlate higher than those predictors already in the equation? 
The correlation matrix computed in Step 7 can be consulted for this purpose. 
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d. Independence as to Current Predictors. Does the variable correlate substantially less with 
the current predictors than with major GPA? The correlation matrix computed in Step 7 
can be consulted for this purpose. If the new variable is highly correlated with existing 
predictors it will, when introduced into a trial solution, contribute little or nothing to the 
composite accuracy of the equation and may in fact disturb an existing solution so that 
illogical signs anu distorted magnitudes in beta coefficients will appear. 



Because a substantial number of the variables in the CACS data set are composites of other 
variables, caution should be exercised before introducing a composite when one or more of its 
components or a simOar variable is already in the existing equation model. Criteria c and d 
should be carefully applied, and if it is still desirable to attempt to introduce the composite into 
the model, it should be done one variable at a time whOe observing what happens to the 
regression coefficients already in the equation. This procedure vvould be recommended, for 
example, if the analyst desired to introduce a variable like the Composite Enghsii Score into an 
equation already containing Verbal AptitUv^e. 

EHscussion 

The foregoing description of CACS mathematical model maintenance is based on standard regression 
analysis principles that hav(^ been proven for many years. The application of these principles should lead to 
satisfactory service from the model. However, there are other approaches to model constmction that may 
be considered in this context. Some of these may appear to be fascinating potential alternatives to the 
standard regression approach, but caution must be exercised before an alternative approach is substituted 
for the one currently installed. The comparison of assets and liabOities of competin?^ approaciics should be 
carefully investigated by thorougii and conclusive scientific testing based on such criteria as relative cost, 
accuracy, and efficiency. 

For example, a model could be constructed that forced a prediction weight for all or most of the 31 
predictor variables in the student data set. This has certain appealing features associated with the 
redundancy of the data provided by the CAIDS-PRC files. Presumably, the model would use all of the data 
that is available to it. The equation weights for such a model could be derived by a combination of factor 
analysis and regression analysis procedures simOar to those discussed (Burket, 1964; Herzberg, 1969; Horst, 
1941;Leiman, 1951). 

Conceivably, a model like this might be somewhat more accurate than the standard regression model, 
especially when the samp'? size for a major is relatively small. Wlietlier tlie net accuracy accrued across all 
majors and levels would be justified by the additional data analysis upkeep incurred is unknown at present. 
Only a special and thorough investigation based on rigorous, comparative analysis will be able to answer this 
question or related questions concerning other modeling approaches that miglit appear to have potential. 



46 



Unclassified 



Security Classification 



DOCUMENT CONTROL DATA -R&D 

(Security clan si fication of title, body of abstract and indexing ennotation must be entered when the overall report Is cfassified) 



ORIGINATING ACTlvrTY (Corporate author) 

System Development Corporation 
2500 Colorado Avenue 
Santa Monica, California 90406 



2«, RE PORT SE CVJ Rl TY CLASSIFICATION 

Unclassified 



3. REPO RT TITLE 

AUTOMATIC DATA PROCESSING SYSTEM AND PROCEDURES 
COMPUTERIZED ACADEMIC COUNSELING SYSTEM 



2b. GROUP 



N/A 



*• OfeSCRiPTiVE NOTES (Type of report and incJusive dates) 

Final Report 



S. A\jTHOR(Sl (FfrsX name, middle initial, iaat name) 

Henry J. Zagorski Gloria L. Grace 

Lemont E. Southworth Russell L. Smith 
Stuart E. Charlston 



0. REPOR T DATE 

June 1973 



8a. CONTRACT OR GRANT NO. 

F4I609-7I-C-0028 

Project no. ILIR 

c. ILIR-00 
ILIR^O-31 



7a. TOTAL NO. OF PAGES 

46 



lb. NO. OF REFS 

8 



9a. ORIGIUATOR'S REPORT NU"eeR(S) 



AFHRL-TR-73.6 



9b. OTHER REPORT NOlS) (Any other numbers that may be assigned 
this report) 



10. DISTRIBUTION STATEMENT 

Approved fo'* public release; distribution unlimited. 




11. SUPPLEMENTARY NOTES 

None 

13. ABSTRACT 


12. SPONSORING MILITARY ACTIVITY 

Air Force Human Resources Laboratory 

Attn: AFHRL/EDA 

Brooks Air Force Base, Texas 78235 



This report provides a technical analysis and review of the Computerized Academic Counseling System (CACS) 
designed and developed by the System Envelopment Corporation. The system was constructed to assist counselors in 
guiding undergraduate college students toward the selection of optimal academic majors. 

Problem review and definition, system analysis, design rationale, methodological approach, measurement 
specifications, data base compilation, mathematical modeling, statistical results, and validation tests are presented in 
various degrees of detail. Counbe.ing application directions, capabilities, and potential are described. 

Computerized academic counseling is discussed in the context of career success likelihood. Recommendations for 
extending the approach to include additional aspects of career guidance are made. 

A concept for an Air Force career counseling system that effectively permits officers and airmen to shape their 
own careers is discussed. Funciional components of the system include: (a) an Air Force personnel needs and resources 
forecast model, (b) a data base for the development and continuous support of the model, and (c) an Air. Force 
mechanism which permits personnel to select careers of their choice and offers assurance that such careers will be 
obtained. Preliminary analyses indicate that such a system is entirely feasible and could have significant positive impact 
on Air Force enlistment and turnover rates. Recommendations are presented which suggest appropriate initial research 
and development stages. 



sj 1NOV6. U nclassified 



ERIC 



Security Classification 



Unclassified 



Security CUssifiotion 



14. 



LINK A 



LINK B 



LINK C 



KEY WORDS 



ROUE 



ROLE 



WT 



ROLE 



WT 



counseling 
career counseling 
vocational guidance 
predicting university success 
academic grade forecasting 
university major differentiation 
computerized counseling 
automatic occupational guidance 
forecast model 
personnel turnover 
personnel recruitment 



■ » 




Security Classification 



Unclassified 



